Schloss Dagstuhl – Leibniz Center for Informatics
DROPS Dagstuhl Research Online Publication ServerNot a member yet
23028 research outputs found
Sort by
LIPIcs, Volume 328, ICDT 2025, Complete Volume
LIPIcs, Volume 328, ICDT 2025, Complete Volum
The Quest for Faster Join Algorithms (Invited Talk)
Joins are the cornerstone of relational databases. Surprisingly, even after several decades of research in the systems and theory database community, we still lack an understanding of how to design the fastest possible join algorithm. In this talk, we will present the exciting progress the database theory community has achieved in join algorithms over the last two decades. The talk will revolve around five key ideas fundamentally shaping this research area: tree decompositions, data partitioning, leveraging statistical information, enumeration, and algebraic techniques
FC-Datalog as a Framework for Efficient String Querying
Core spanners are a class of document spanners that capture the core functionality of IBM’s AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures , it is also a tractable version of Datalog on strings. This presents an opportunity for optimization.
We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog
An FPRAS for Model Counting for Non-Deterministic Read-Once Branching Programs
Non-deterministic read-once branching programs, also known as non-deterministic free binary decision diagrams (nFBDD), are a fundamental data structure in computer science for representing Boolean functions. In this paper, we focus on #nFBDD, the problem of model counting for non-deterministic read-once branching programs. The #nFBDD problem is #P-hard, and it is known that there exists a quasi-polynomial randomized approximation scheme for #nFBDD. In this paper, we provide the first FPRAS for #nFBDD. Our result relies on the introduction of new analysis techniques that focus on bounding the dependence of samples
Talking Wikidata: Communication Patterns and Their Impact on Community Engagement in Collaborative Knowledge Graphs
We study collaboration patterns of Wikidata, one of the world's largest open source collaborative knowledge graph (KG) communities.
Collaborative KG communities, play a key role in structuring machine-readable knowledge to support AI systems like conversational agents. However, these communities face challenges related to long-term member engagement, as a small subset of contributors often is responsible for the majority of contributions and decision-making. While prior research has explored contributors' roles and lifespans, discussions within collaborative KG communities remain understudied. To fill this gap, we investigated the behavioural patterns of contributors and factors affecting their communication and participation. We analysed all the discussions on Wikidata using a mixed methods approach, including statistical tests, network analysis, and text and graph embedding representations. Our findings reveal that the interactions between Wikidata editors form a small world network, resilient to dropouts and inclusive, where both the network topology and discussion content influence the continuity of conversations. Furthermore, the account age of Wikidata members and their conversations are significant factors in their long-term engagement with the project.
Our observations and recommendations can benefit the Wikidata and semantic web communities, providing guidance on how to improve collaborative environments for sustainability, growth, and quality
Trustworthy Generative AI for Financial Services (Practitioner Track)
This work introduces GFT EnterpriseGPT, a regulatory-compliant, trustworthy generative AI (GenAI) platform tailored for the financial services sector. We discuss the unique challenges of applying GenAI in highly regulated environments. In the financial sector data privacy, ethical considerations, and regulatory compliance are paramount. Our solution addresses these challenges through multi-level safeguards, including robust guardrails, privacy-preserving techniques, and grounding mechanisms. Robust guardrails prevent unsafe inputs and outputs, and privacy-preserving techniques reduce the need for data transmission to third-party providers. In contrast, grounding mechanisms ensure the accuracy and reliability of artificial intelligence (AI) generated content. By incorporating these measures, we propose a path forward for safely harnessing the transformative potential of GenAI in finance, ensuring reliability, transparency, and adherence to ethical and regulatory standards. We demonstrate the practical application of GFT EnterpriseGPT within a large-scale financial institution, where it successfully improves operational efficiency and compliance
EAM Diagrams - A Framework to Systematically Describe AI Systems for Effective AI Risk Assessment (Academic Track)
Artificial Intelligence (AI) is a transformative technology that offers new opportunities across various applications. However, the capabilities of AI systems introduce new risks, which require the adaptation of established risk assessment procedures. A prerequisite for any effective risk assessment is a systematic description of the system under consideration, including its inner workings and application environment. Existing system description methodologies are only partially applicable to complex AI systems, as they either address only parts of the AI system, such as datasets or models, or do not consider AI-specific characteristics at all. In this paper, we present a novel framework called EAM Diagrams for the systematic description of AI systems, gathering all relevant information along the AI life cycle required to support a comprehensive risk assessment. The framework introduces diagrams on three levels, covering the AI system’s environment, functional inner workings, and the learning process of integrated Machine Learning (ML) models
Scaling of End-To-End Governance Risk Assessments for AI Systems (Practitioner Track)
Artificial Intelligence (AI) systems are embedded in a multifaceted environment characterized by intricate technical, legal, and organizational frameworks. To attain a comprehensive understanding of all AI-related risks, it is essential to evaluate both model-specific risks and those associated with the organizational and governance setups. We categorize these as "bottom-up risks" and "top-down risks," respectively. In this paper, we focus on the expansion and enhancement of a testing and auditing technology stack to identify and manage governance-related risks ("top-down"). These risks emerge from various dimensions, including internal development and decision-making processes, leadership structures, security setups, documentation practices, and more. For auditing governance related risk, we implement a traditional risk management framework and map it to the specifics of AI systems. Our end-to-end (from identification to monitoring) risk management kernel follows these implementation steps:
- Identify
- Collect
- Assess
- Comply
- Monitor We demonstrate that scaling of such a risk auditing tool requires fundamental aspects. Those aspects include for instance a role-based approach, covering different roles in the development of complex AI systems. Ensuring compliance and secure record-keeping through audit-proof capabilities is also paramount. This ensures that the auditing technology can withstand scrutiny and maintain the integrity of records over time. Another critical aspect is the integrability of the auditing tool within existing risk management and governance infrastructures. This integration is essential to reduce the barriers for companies to comply with current regulatory requirements, such as the EU AI Act [European Parliament and the Council of the EU, 2024], and established standards like ISO 42001:2023. Ultimately, we demonstrate that this approach provides a robust technology stack for ensuring that AI systems are developed, utilized and supervised in a manner that is both compliant with regulatory standards and aligned with best practices in risk management and governance
AI Assessment in Practice: Implementing a Certification Scheme for AI Trustworthiness (Academic Track)
The trustworthiness of artificial intelligence systems is crucial for their widespread adoption and for avoiding negative impacts on society and the environment. This paper focuses on implementing a comprehensive certification scheme developed through a collaborative academic-industry project. The scheme provides practical guidelines for assessing and certifying the trustworthiness of AI-based systems. The implementation of the scheme leverages aspects from Machine Learning Operations and the requirements management tool Jira to ensure continuous compliance and efficient lifecycle management. The integration of various high-level frameworks, scientific methods, and metrics supports the systematic evaluation of key aspects of trustworthiness, such as reliability, transparency, safety and security, and human oversight. These methods and metrics were tested and assessed on real-world use cases to dependably verify means of compliance with regulatory requirements and evaluate criteria and detailed objectives for each of these key aspects. Thus, this certification framework bridges the gap between ethical guidelines and practical application, ensuring the safe and effective deployment of AI technologies
First-Order Logic with Equicardinality in Random Graphs
We answer a question of Blass and Harary about the validity of the zero-one law in random graphs for extensions of first-order logic (FOL). For a given graph property P, the Lindström extension of FOL by P is defined as the minimal (regular) extension of FOL able to express P. For several graph properties P (e.g. Hamiltonicity), it is known that the Lindström extension by P is also able to interpret a segment of arithmetic, and thus strongly disobeys the zero-one law. Common to all these properties is the ability to express the Härtig quantifier, a natural extension of FOL testing if two definable sets are of the same size. We prove that the Härtig quantifier is sufficient for the interpretation of arithmetic, thus providing a general result which implies all known cases of Lindström extensions which are able to interpret a segment of arithmetic