1,721,051 research outputs found

    Helping Wine Lovers with Taxonomies

    No full text
    We formally investigate the problem of retrieving the best results complying with multiple preferences expressed in a logic-based language when data are stored in relational tables with taxonomic domains. We introduce two operators that rewrite preferences for enforcing transitivity, which guarantees soundness of the result, and specificity, which solves conflicts among preferences. We show that these two properties cannot be achieved together and identify the only two possibilities to ensure transitivity and minimize conflicts. Our approach proves effective when tested over both synthetic and real-world datasets

    Preference queries over taxonomic domains

    Full text link
    When composing multiple preferences characterizing the most suitable results for a user, several issues may arise. Indeed, preferences can be partially contradictory, suffer from a mismatch with the level of detail of the actual data, and even lack natural properties such as transitivity. In this paper we formally investigate the problem of retrieving the best results complying with multiple preferences expressed in a logic-based language. Data are stored in relational tables with taxonomic domains, which allow the specification of preferences also over values that are more generic than those in the database. In this framework, we introduce two operators that rewrite preferences for enforcing the important properties of transitivity, which guarantees soundness of the result, and specificity, which solves all conflicts among preferences. Although, as we show, these two properties cannot be fully achieved together, we use our operators to identify the only two alternatives that ensure transitivity and minimize the residual conflicts. Building on this finding, we devise a technique, based on an original heuristics, for selecting the best results according to the two possible alternatives. We finally show, with a number of experiments over both synthetic and real-world datasets, the effectiveness and practical feasibility of the overall approach

    Information discovery in polystores: The augmented way

    No full text
    Polystores provide a loosely coupled integration of heterogeneous data sources based on the direct access, with the local language, to each storage engine for exploiting its distinctive features. In this framework, given the absence of a middleware exposing a global schema, it is hard to know if a query to one system can be satisfied by data stored elsewhere in the polystore. We address this problem by illustrating query augmentation, a data manipulation operator for polystores based on the automatic enrichment of the answer to a local query with related data in the rest of the polystore. Augmentation can be used to implement augmented search and augmented exploration: two effective methods for information discovery in polystores that avoid middleware layers, abstract query languages, and shared data models

    Conceptual Constraints for Data Quality in Data Lakes

    No full text
    A data lake is a loosely-structured collection of data at scale built for analysis purposes that is initially fed with almost no requirement of data quality. This approach aims at eliminating any effort before the actual exploitation of data, but the problem is only delayed since robust and defensible data analysis can only be performed after very complex data preparation activities. In this paper, we address this problem by proposing a novel and general approach to data curation in data lakes based on: (i) the specification of integrity constraints over a conceptual representation of the data lake and (ii) the automatic translation and enforcement of such constraints over the actual data. We discuss the advantages of this idea and the challenges behind its implementation

    From why-provenance to why+provenance: Towards addressing deep data explanations in Data-Centric AI

    No full text
    In this position paper we discuss the problem of exploiting data provenance to provide explanations in data-centric AI processes, where the emphasis of model development is placed on the quality of data. In particular, we show how a classification of the main operators used in the data preparation phase provides an effective and powerful means for the production of increasingly detailed explanations at the needed level of data granularity
    corecore