1,721,043 research outputs found

    Foundations of ontology-based data access under bag semantics

    Full text link
    Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign to ontology predicates views over the data. The conventional semantics of OBDA is set-based—that is, the extension of the views defined by the mappings does not contain duplicate tuples. This treatment is, however, in disagreement with the standard semantics of database views and database management systems in general, which is based on bags and where duplicate tuples are retained by default. The distinction between set and bag semantics in databases is very significant in practice, and it influences the evaluation of aggregate queries. In this article, we propose and study a bag semantics for OBDA which provides a solid foundation for the future study of aggregate and analytic queries. Our semantics is compatible with both the bag semantics of database views and the set-based conventional semantics of OBDA. Furthermore, it is compatible with existing bag-based semantics for data exchange recently proposed in the literature. We show that adopting a bag semantics makes conjunctive query answering in OBDA CONP-hard in data complexity. To regain tractability of query answering, we consider suitable restrictions along three dimensions, namely, the query language, the ontology language, and the adoption of the unique name assumption. Our investigation shows a complete picture of the computational properties of query answering under bag semantics over ontologies in the DL-Lite family.</p

    The Bag Semantics of Ontology-Based Data Access

    Full text link
    Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags.</jats:p

    Classification of Annotation Semirings over Query Containment

    Full text link
    We study the problem of query containment of conjunctive queries over annotated databases. Annotations are typically attached to tuples and represent metadata, such as probability, multiplicity, comments, or provenance. It is usually assumed that annotations are drawn from a commutative semiring. Such databases pose new challenges in query optimization, since many related fundamental tasks, such as query containment, have to be reconsidered in the presence of propagation of annotations. We axiomatize several classes of semirings for each of which containment of conjunctive queries is equivalent to existence of a particular type of homomorphism. For each of these types, we also specify all semirings for which existence of a corresponding homomorphism is a sufficient (or necessary) condition for the containment. We develop new decision procedures for containment for some semirings which are not in any of these classes. This generalizes and systematizes previous approaches

    Entity comparison in knowledge graphs

    Full text link
    Knowledge graphs (KGs) represent information in a simple, interlinked format and are used in a variety of applications. Besides the standard reasoning tasks such as query answering, there has been an increasing need for new types of analysis. One such fundamental analysis task is entity comparison, i.e., determining what two entities have in common and how they are different. The importance of such a task can be seen in various applications such as product comparisons, and explainable recommender systems. This thesis studies the problem of entity comparison over knowledge graphs and presents a novel framework that models comparisons via similarity and difference queries. It is the first declarative comparison framework for linked data that treats both similarities and differences as first-class citizens. We first define the syntax and semantics of various types of comparison queries, motivating each type with examples and establishing important relations between different query types. Next, we study the complexity of the two fundamental decision problems over queries, namely the existence and verification problems, for basic comparison queries and for most specific similarity queries (MSSQs). We consider two underlying fragments of SPARQL, with and without arithmetic comparisons, and we show how they affect the complexity of the above problems, and which types of data they suit most. We then study practical aspects of computing the MSSQ. In particular, we establish an important uniqueness result for MSSQs and propose an efficient approximation algorithm. We implement an algorithm for computing acyclic similarity queries and evaluate its performance on large-scale KGs; our empirical results demonstrate the practical feasibility of our algorithm

    On modal characterisation theorems over finite structures with applications to graph neural networks

    No full text
    Bisimulations capture a natural notion of equivalence for elements in relational structures, which was independently discovered in different areas of computer science. The relevance of bisimulations across fields naturally motivates the characterisation of the bisimulation-invariant fragments for various logics. Famously, a seminal theorem by van Benthem characterises the bisimulation-invariant fragment of first-order logic (FOL) in terms of modal logic. Similarly, Janin and Walukiewicz showed that the bisimulation-invariant fragment of monadic second-order logic (MSO) corresponds precisely to the (modal) mu-calculus. Notably, neither van Benthem’s nor Janin and Walukiewicz’s argument immediately carries over when only considering finite structures. In the case of FOL, Rosen provided an alternative proof, which also holds in the relativised setting. However, further modal characterisations for stronger notions of bisimulation and any logic more expressive than FOL over finite structures have remained long-standing open problems in finite model theory. In this thesis, we answer several of these open problems. In particular, as our main result, we prove that, over finite structures, every two-way bisimulation-invariant parameter-free monadic least fixpoint logic formula, a well-known logic in between FOL and MSO, is equivalent to a formula in the two-way mu-calculus and vice versa. Moreover, in a second line of work, we show that multiple notions of bisimulations considered in this thesis capture natural notions of invariance in graph learning. In particular, we show that node classifiers, functions mapping graphs to subsets of their nodes, realised by several classes of Graph Neural Networks (GNNs) can, in each case, be characterised by invariance under a variant of bisimulations. Finally, given our semantic characterisations of the considered GNN classes in terms of invariance under bisimulations, we show that we can obtain conditional logical characterisations for the node classifiers realised by the considered GNN classes through applications of our modal characterisation theorems

    Deep learning with knowledge graphs using graph neural networks

    Full text link
    Knowledge Graphs (KGs) are reshaping the paradigm of representing, organising, and utilising information about the world. They provide rich semantic information, and have emerged as a driving power of Artificial Intelligence (AI). Primarily, there are two types of important research directions for KGs: one focuses on constructing and improving the quality of KGs, and the other delves into the wide range of applications of KGs. Recent years have also witnessed the advancement of Graph Neural Networks (GNNs), which are a class of deep learning techniques applicable to the graph domain and have demonstrated promising performance in many tasks. While there have been research attempts of applying GNNs to the KG-related tasks, there still remain several open challenges with the models’ function design, scalability issues, transductive nature of being limited to predicting for entities observed during training, and the quality of the benchmarks. In this thesis, towards deep learning with KGs using GNNs, we consider the tasks of inductive KG completion and inductive knowledge-enhanced recommendation in the context of the two directions, and propose novel GNN-based approaches to address the challenges. Our extensive empirical evaluation shows that our approaches outperform the state-of-the-art approaches on a collection of baselines, and can achieve efficient training and testing in practice. We also take a further step into the KG completion problem by revisiting the benchmarks in the transductive settings. In particular, we propose a new approach to generate benchmarks that can help empirically assess models’ ability to capture inference patterns. Our findings highlight the gaps between theoretical and empirical results concerning such inference ability

    Beyond well-designed SPARQL

    Full text link
    SPARQL is the standard query language for RDF data. The distinctive feature of SPARQL is the OPTIONAL operator, which allows for partial answers when complete answers are not available due to lack of information. However, optional matching is computationally expensive - query answering is PSPACE-complete. The well-designed fragment of SPARQL achieves much better computational properties by restricting the use of optional matching - query answering becomes coNP-complete. However, well-designed SPARQL captures far from all real-life queries - in fact, only about half of the queries over DBpedia that use OPTIONAL are well-designed. In the present paper, we study queries outside of well-designed SPARQL. We introduce the class of weakly well-designed queries that subsumes well-designed queries and includes most common meaningful non-well-designed queries: our analysis shows that the new fragment captures about 99% of DBpedia queries with OPTIONAL. At the same time, query answering for weakly well-designed SPARQL remains coNP-complete, and our fragment is in a certain sense maximal for this complexity. We show that the fragment's expressive power is strictly in-between well-designed and full SPARQL. Finally, we provide an intuitive normal form for weakly well-designed queries and study the complexity of containment and equivalence

    Bagging the DL-Lite family further

    Full text link
    Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of an ontology, which is usually expressed in a description logic (DL) of DL-Lite family. The conventional semantics of OBDA and DLs is set-based—that is, duplicates are disregarded. This disagrees with the standard database bag (multiset) semantics, which is especially important for the correct evaluation of aggregate queries. In this article, we study two variants of the bag semantics for query answering over DL-LiteF , extending basic DL-Litecore with functional roles. For our first semantics, which follows the semantics of primary keys in SQL, conjunctive query (CQ) answering is coNP-hard in data complexity in general, but it is in TC0 for the restricted class of rooted CQs; such CQs are also rewritable to the bag relational algebra. For our second semantics, the results are the same except that TC0 membership and rewritability hold only for the restricted class of ontologies identified by a new notion of functional weak acyclicit

    Rule-based stream reasoning

    Full text link
    In recent years, there has been an increasing interest in extending stream processing engines with rule-based temporal reasoning capabilities. To ensure correctness, such systems must be able to output results over the partial data received so far as if the entire (infinite) stream had been available; furthermore, these results must be streamed out as soon as the relevant data is received, thus incurring the minimum possible latency; finally, due to memory limitations, systems can only keep a limited history of previous facts in memory to perform further computations. These requirements pose significant theoretical and practical challenges since temporal rules can derive new information and propagate it both towards past and future time points; as a result, streamed answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Towards developing a solid foundation for practical rule-based stream reasoning, we propose and study in this thesis a suite of decision problems that can be exploited by stream reasoning algorithms to tackle the aforementioned challenges, and provide tight complexity bounds for a core temporal extension of Datalog. All of the problems we consider can be solved at design time (under reasonable assumptions), prior to the processing of any data. Solving these problems enables the use of reasoning algorithms that process the input streams incrementally using a sliding window, while at the same time supporting an expressive rule-based knowledge representation language and minimising both latency and memory consumption

    Evaluating conjunctive and graph queries over the EL profile of OWL 2

    Full text link
    OWL 2 EL is a popular ontology language that is based on the EL family of description logics and supports regular role inclusions,axioms that can capture compositional properties of roles such as role transitivity and reflexivity. In this thesis, we present several novel complexity results and algorithms for answering expressive queries over OWL 2 EL knowledge bases (KBs) with regular role inclusions. We first focus on the complexity of conjunctive query (CQ) answering in OWL 2 EL and show that the problem is PSpace-complete in combined complexity, the complexity measured in the total size of the input. All the previously known approaches encode the regular role inclusions using finite automata that can be worst-case exponential in size, and thus are not optimal. In our PSpace procedure, we address this problem by using a novel, succinct encoding of regular role inclusions based on pushdown automata with a bounded stack. Moreover, we strengthen the known PSpace lower complexity bound and show that the problem is PSpace-hard even if we consider only the regular role inclusions as part of the input and the query is acyclic; thus, our algorithm is optimal in knowledge base complexity, the complexity measured in the size of the KB, as well as for acyclic queries. We then study graph queries for OWL 2 EL and show that answering positive, converse- free conjunctive graph queries is PSpace-complete. Thus, from a theoretical perspective, we can add navigational features to CQs over OWL 2 EL without an increase in complexity. Finally, we present a practicable algorithm for answering CQs over OWL 2 EL KBs with only transitive and reflexive composite roles. None of the previously known approaches target transitive and reflexive roles specifically, and so they all run in PSpace and do not provide a tight upper complexity bound. In contrast, our algorithm is optimal: it runs in NP in combined complexity and in PTime in KB complexity. We also show that answering CQs is NP-hard in combined complexity if the query is acyclic and the KB contains one transitive role, one reflexive role, or nominals—concepts containing precisely one individual
    corecore