1,720,976 research outputs found

    Data Accuracy as Knowledge in Ontology Based Data Access (preliminary report)

    Full text link
    In the context of Ontology Based Data Access (OBDA), consistency of data ensures that the data sources are coherent with the rules of the domain of interest represented by the ontology.However, even when consistency holds, the data underlying an OBDA system canstill be in a state that users perceive of poor quality, according to some intuitiverequirements. In many of these cases, the mechanism currently used to specify an OBDA system seems to lack of the ability to express such requirements.In this work, we argue that those requirements are often not about the world that the ontology represents, but about the knowledge that the system possesses on the world. Thus, with the aim of formalizing data quality specifications in the OBDA context, we propose the usage of a language of modal constraints, and show how they can be used in practice to capture cases of poor data quality. For this novel class of assertions, and for OBDA systems where the ontology is expressed in DL-Lite, we present algorithms and complexity results for the problem of checking the accuracy of the knowledge that the system posses, i.e., whether the system respects the modal constraints in the specification.<br/

    Editorial: Special Issue on Quality Aspects of Data Preparation

    Full text link
    This Special Issue of the Journal of Data and Information Quality (JDIQ) contains novel theoretical and methodological contributions as well as state-of-the-art reviews and research perspectives on quality aspects of data preparation. In this editorial, we summarize the scope of the issue and briefly describe its content

    Fragments of bag relational algebra: Expressiveness and certain answers

    Full text link
    While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra

    Model-theoretic characterizations of rule-based ontologies

    Full text link
    An ontology specifies an abstract model of a domain of interest via a formal language that is typically based on logic. Although description logics are popular formalisms for modeling ontologies, tuple-generating dependencies (tgds), originally introduced as a unifying framework for database integrity constraints, and later on used in data exchange and integration, are also well suited for modeling ontologies that are intended for data-intensive tasks. The reason is that, unlike description logics, tgds can easily handle higher-arity relations that naturally occur in relational databases. In recent years, there has been an extensive study of tgd-ontologies and of their applications to several different data-intensive tasks. However, the fundamental question of whether the expressive power of tgd-ontologies can be characterized in terms of model-theoretic properties remains largely unexplored. We establish several characterizations of tgd-ontologies, including characterizations of ontologies specified by such central classes of tgds as full, linear, guarded, and frontier-guarded tgds. Our characterizations use the well-known notions of critical instance and direct product, as well as a novel locality property for tgd-ontologies. We further use this locality property to decide whether an ontology expressed by frontier-guarded (respectively, guarded) tgds can be expressed by tgds in the weaker class of guarded (respectively, linear) tgds, and effectively construct such an equivalent ontology if one exists

    Propositional and predicate logics of incomplete information

    No full text
    One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it

    Reasoning about measures of unmeasurable sets

    Full text link
    In a variety of reasoning tasks, one estimates the likelihood of events by means of volumes of sets they define. Such sets need to be measurable, which is usually achieved by putting bounds, sometimes ad hoc, on them. We address the question how unbounded or unmeasurable sets can be measured nonetheless. Intuitively, we want to know how likely a randomly chosen point is to be in a given set, even in the absence of a uniform distribution over the entire space. To address this, we follow a recently proposed approach of taking intersection of a set with balls of increasing radius, and defining the measure by means of the asymptotic behavior of the proportion of such balls taken by the set. We show that this approach works for every set definable in first-order logic with the usual arithmetic over the reals (addition, multiplication, exponentiation, etc.), and every uniform measure over the space, of which the usual Lebesgue measure (area, volume, etc.) is an example. In fact we establish a correspondence between the good asymptotic behavior and the finiteness of the VC dimension of definable families of sets. Towards computing the measure thus defined, we show how to avoid the asymptotics and characterize it via a specific subset of the unit sphere. Using definability of this set, and known techniques for sampling from the unit sphere, we give two algorithms for estimating our measure of unbounded unmeasurable sets, with deterministic and probabilistic guarantees, the latter being more efficient. Finally we show that a discrete analog of this measure exists and is similarly well-behaved

    Do We Need Many-valued Logics for Incomplete Information?

    Full text link
    One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it

    Queries with Arithmetic on Incomplete Databases

    Full text link
    The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

    On separability of ontological constraints

    No full text
    When data schemata are enriched with expressive constraints that aim at representing the domain of interest, in order to answer queries one needs to consider the logical theory consisting of both the data and the constraints. Query answering in such a context is called ontological query answering. Commonly adopted database constraints in this field are tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). It is well known that their interaction leads to intractability or undecidability of query answering even in the case of simple subclasses. Several conditions have been found to guarantee separability, that is lack of interaction, between TGDs and EGDs. Separability makes EGDs (mostly) irrelevant for query answering and therefore often guarantees tractability, as long as the theory is satisfiable. In this paper we review the two notions of separability found in the literature, as well as several syntactic conditions that are sufficient to prove them. We then shed light on the issue of satisfiability checking, showing that under a sufficient condition called deep separability it can be done by considering the TGDs only. We show that, fortunately, in the case of TGDs and EGDs, separability implies deep separability. This result generalizes several analogous ones, proved ad hoc for particular classes of constraints. Applications include the class of sticky TGDs and EGDs, for which we provide a syntactic separability condition which extends the analogous one for linear TGDs; preliminary experiments show the feasibility of query answering in this case

    On querying incomplete information in databases under bag semantics

    Full text link
    Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries - expressed either in positive relational algebra or as unions of conjunctive queries - finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency
    corecore