1,721,008 research outputs found

    Introduction

    No full text
    No abstract

    Fragments of bag relational algebra: Expressiveness and certain answers

    Full text link
    While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra

    Reasoning about measures of unmeasurable sets

    Full text link
    In a variety of reasoning tasks, one estimates the likelihood of events by means of volumes of sets they define. Such sets need to be measurable, which is usually achieved by putting bounds, sometimes ad hoc, on them. We address the question how unbounded or unmeasurable sets can be measured nonetheless. Intuitively, we want to know how likely a randomly chosen point is to be in a given set, even in the absence of a uniform distribution over the entire space. To address this, we follow a recently proposed approach of taking intersection of a set with balls of increasing radius, and defining the measure by means of the asymptotic behavior of the proportion of such balls taken by the set. We show that this approach works for every set definable in first-order logic with the usual arithmetic over the reals (addition, multiplication, exponentiation, etc.), and every uniform measure over the space, of which the usual Lebesgue measure (area, volume, etc.) is an example. In fact we establish a correspondence between the good asymptotic behavior and the finiteness of the VC dimension of definable families of sets. Towards computing the measure thus defined, we show how to avoid the asymptotics and characterize it via a specific subset of the unit sphere. Using definability of this set, and known techniques for sampling from the unit sphere, we give two algorithms for estimating our measure of unbounded unmeasurable sets, with deterministic and probabilistic guarantees, the latter being more efficient. Finally we show that a discrete analog of this measure exists and is similarly well-behaved

    Approximations and refinements of certain answers via many-valued logics

    No full text
    Computing certain answers is the preferred way of answering queries in scenarios involving incomplete data. This, however, is computationally expensive, so practical systems use efficient techniques based on a particular three-valued logic, even though this often leads to incorrect results. Our goal is to provide a general manyvalued framework for correctly approximating certain answers. We do so by defining the semantics of manyvalued answers and queries, following the principle that additional knowledge about the input must translate into additional knowledge about the output. This framework lets us compare query outputs and evaluation procedures in terms of their informativeness. For each manyvalued logic with a knowledge ordering on its truth values, one can build a syntactic evaluation procedure for all first-order queries, that correctly approximates certain answers; additional truth values are used to refine information about certain answers. For concrete examples, we show that a recently proposed approach fixing some of the inconsistencies of SQL query evaluation is an immediate consequence of our framework, and we further refine it by adding a fourth truth value.We show that no evaluation procedure based on Boolean logic delivers correctness guarantees. Finally, we study the relative power of evaluation procedures based on the informativeness of the answers they produce

    On querying incomplete information in databases under bag semantics

    Full text link
    Querying incomplete data is an important task both in data management, and in many AI applications that use query rewriting to take advantage of relational database technology. Usually one looks for answers that are certain, i.e., true in every possible world represented by an incomplete database. For positive queries - expressed either in positive relational algebra or as unions of conjunctive queries - finding such answers can be done efficiently when databases and query answers are sets. Real-life databases however use bag, rather than set, semantics. For bags, instead of saying that a tuple is certainly in the answer, we have more detailed information: namely, the range of the numbers of occurrences of the tuple in query answers. We show that the behavior of positive queries is different under bag semantics: finding the minimum number of occurrences can still be done efficiently, but for maximum it becomes intractable. We use these results to investigate approximation schemes for computing certain answers to arbitrary first-order queries that have been proposed for set semantics. One of them cannot be adapted to bags, as it relies on the intractable maxima of occurrences, but another scheme only deals with minima, and we show how to adapt it to bag semantics without losing efficiency

    Propositional and predicate logics of incomplete information

    No full text
    One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it

    Introduction

    No full text
    No abstract

    Measuring the likelihood of numerical constraints

    Full text link
    Our goal is to measure the likelihood of the satisfaction of numerical constraints in the absence of prior information. We study expressive constraints, involving arithmetic and complex numerical functions, and even quantification over numbers. Such problems arise in processing incomplete data, or analyzing conditions in programs without a priori bounds on variables. We show that for constraints on n variables, the proper way to define such a measure is as the limit of the part of the n-dimensional ball that consists of points satisfying the constraints, when the radius increases. We prove that the existence of such a limit is closely related to the notion of o-minimality from model theory. Thus, for constraints definable with the usual arithmetic and exponentiation, the likelihood is well defined, but adding trigonometric functions is problematic. We look at computing and approximating such likelihoods for order and linear constraints, and prove an impossibility result for approximating with multiplicative error. However, as the likelihood is a number between 0 and 1, an approximation scheme with additive error is acceptable, and we give it for arbitrary linear constraints

    Queries with Arithmetic on Incomplete Databases

    Full text link
    The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

    Do We Need Many-valued Logics for Incomplete Information?

    Full text link
    One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene's logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene's logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it
    corecore