Search CORE

1,721,004 research outputs found

Schema Mapping and Data Exchange Tools: Time for the Golden Age

Author: MECCA Giansalvatore
PAPOTTI P.
Publication venue
Publication date: 01/01/2012
Field of study

In the last 10 years, schema mapping management has become an important research area in data transformation, exchange, and integration systems. The reasons for its success can be found in the declarative nature of its building block (thus enabling clean semantics and easy to use design tools) paired with the efficiency and modularity in the deployment step. In this paper we sketch a line of evolution in schema-mappings and data exchange systems, through what we identify as three main ages. We start presenting the foundations of schema mapping tools and the first tools aimed at translating data from a source to a target schema in the first, heroic age. We then discuss the silver age, when schema mapping tools have grown their way into complex systems and have been translated into both commercial and open-source tools. Finally, we show how recent results in schema-mapping and data-exchange research may be considered the starting point for a forthcoming golden age, with novel research opportunities and a new generation of systems capable of dealing with a significantly larger class of real-life applications

Crossref

Archivio della Ricerca - Università della Basilicata

Evaluating Ambiguous Questions in Semantic Parsing

Author: Cagliero L.
Papotti P.
Papicchio S.
Publication venue
Publication date: 01/01/2024
Field of study

Tabular Representation Learning and Large Language Models have recently achieved promising results in solving the Semantic Parsing (SP) task. Given a question posed in natural language on a relational table, the goal is to return to the end-users executable SQL declarations. However, models struggle to produce the correct output when questions are ambiguously defined w.r.t. the table schema. Assessing the robustness to data-ambiguity can be particularly time-consuming as entails seeking ambiguous patterns on a large number of queries with varying complexity. To automate this process, we propose Data-Ambiguity Tester, a pipeline for data-ambiguity testing tailored to SP. It first automatically generates non-ambiguous natural language questions and SQL queries of varying complexity. Then, it injects ambiguous patterns, extracted from a human-annotated set of relational tables, in the natural language questions. Finally, it quantifies the level of ambiguity using customized performance metrics. Results show strengths and limitations of existing models in coping with ambiguity between questions and tabular data

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data

Author: Cagliero L.
Papotti P.
Papicchio S.
Publication venue
Publication date: 01/01/2023
Field of study

Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRL Checklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP. Checklist generation is driven by a SQL query engine that crafts tests of different complexity. This design facilitates inherent portability, allowing the checks to be used by alternative models. We also introduce a set of cross-task performance metrics evaluating the TRL model's performance over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including TAPAS, TAPEX, and CHATGPT

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Future locations prediction with uncertain data

Author: Blanco L.
QIU DISHENG
Papotti P.
Publication venue
Publication date: 01/01/2013
Field of study

Archivio della Ricerca - Università di Roma 3

On the Schema Exchange Problem

Author: RICCARDO TORLONE
PAPOTTI PAOLO
TORLONE Riccardo
PAPOTTI P
Publication venue
Publication date: 01/01/2007
Field of study

Archivio della Ricerca - Università di Roma 3

Probabilistic Reconciliation of Records from Inaccurate Web Sources

Author: PAPOTTI PAOLO
Papotti P.
MERIALDO PAOLO
CRESCENZI VALTER
Blanco L
Publication venue
Publication date: 01/01/2010
Field of study

Archivio della Ricerca - Università di Roma 3

Cleaning data with Llunatic

Author: Mecca G.
Santoro D.
Geerts F.
Papotti P.
Publication venue
Publication date: 01/01/2020
Field of study

Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality

Archivio della Ricerca - Università della Basilicata

An Approach to Heterogeneous Data Translation based on XML Conversion

Author: RICCARDO TORLONE
PAPOTTI PAOLO
TORLONE Riccardo
PAPOTTI P
Publication venue
Publication date: 01/01/2004
Field of study

Archivio della Ricerca - Università di Roma 3

Attribute Ambiguity Discovery: A Deep Learning Approach via Unsupervised Learning

Author: Veltri E.
Papotti P.
Badaro G.
Saeed M.
Publication venue
Publication date: 01/01/2023
Field of study

Archivio Istituzionale della Ricerca- Università del Salento

Data Ambiguity Profiling for the Generation of Training Examples

Author: Veltri E.
Papotti P.
Badaro G.
Saeed M.
Publication venue
Publication date: 01/01/2023
Field of study

Several applications, such as text-to-SQL and computational fact checking, exploit the relationship between relational data and natural language text. However, state of the art solutions simply fail in managing "data-ambiguity", i.e., the case when there are multiple interpretations of the relationship between text and data. Given the ambiguity in language, text can be mapped to different subsets of data, but existing training corpora only have examples in which every sentence/question is annotated precisely w.r.t. the relation. This unrealistic assumption leaves the target applications unable to handle ambiguous cases. To tackle this problem, we present an end-to-end solution that, given a table D, generates examples that consist of text, annotated with its data evidence, with factual ambiguities w.r.t. D. We formulate the problem of profiling relational tables to identify row and attribute data ambiguity. For the latter, we propose a deep learning method that identifies every pair of data ambiguous attributes and a label that describes both columns. Such metadata is then used to generate examples with data ambiguities for any input table. To enable scalability, we finally introduce a SQL approach that can generate millions of examples in seconds. We show the high accuracy of our solution in profiling relational tables and report on how our automatically generated examples lead to drastic quality improvements in two fact-checking applications, including a website with thousands of users, and in a text-to-SQL system

Archivio della Ricerca - Università della Basilicata

Archivio Istituzionale della Ricerca- Università del Salento