Search CORE

1,721,147 research outputs found

Sopj: A scalable online provenance join for data integration

Author: Simonini
Zhu
S. Email Author
Fiameni
G. Email Author
Bergamaschi S.
Publication venue
Publication date: 01/01/2017
Field of study

Data integration is a technique used to combine different sources of data together to provide an unified view among them. MOMIS[1] is an open-source data integration framework developed by the DBGroup1. The goal of our work is to make MOMIS be able to scale-out as the input data sources increase without introducing noticeable performance penalty. In particular, we present a full outer join method capable to efficiently integrate multiple sources at the same time by using data streams and provenance information. To evaluate the scalability of this innovative approach, we developed a join engine employing a distributed data processing framework. Our solution is able to process input data sources in the form of continuous stream, execute the join operation on-the-fly and produce outputs as soon as they are generated. In this way, the join can return partial results before the input streams have been completely received or processed optimizing the entire execution

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Enhancing entity resolution efficiency with loosely schema-aware techniques - Discussion paper

Author: Bergamaschi S.
Simonini G.
Publication venue
Publication date: 01/01/2016
Field of study

Entity Resolution, the task of identifying records that refer to the same real-world entity, is a fundamental step in data integration. Blocking is a widely employed technique to avoid the comparison of all possible record pairs in a dataset (an inefficient approach). Renouncing to exploit schema information for blocking has been proved to limit the chance of missing matches (i.e., it guarantees high recall), at the cost of a low precision. Meta-blocking alleviates this issue by restructuring a block collection, removing redundant and superfluous comparisons. Yet, existing meta-blocking techniques exclusively rely on schema-agnostic features. In this paper, we investigate how loose schema information, induced directly from the data, can be exploited in an holistic loosely schema-aware (meta-)blocking approach that outperforms the state-of-the-art meta-blocking in terms of precision, without renouncing high level of recall. We implemented our idea in a system called Blast, and experimentally evaluated it on real-world datasets

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

A model for visual building SPARQL queries

Author: Bergamaschi S.
Benedetti F.
Publication venue
Publication date: 01/01/2016
Field of study

LODeX is a Semantic Web tool that, leveraging a summarized representation of a LOD source structure (i.e. Schema Summary), helps users explore and query SPARQL endpoints by hiding the complexity of Semantic Web technologies. By leveraging Schema Summary of a LOD source, LODeX guides the user in composing visual queries that are automatically translated in correct SPARQL queries through a SPARQL compiler. In this work we inspected how LODeX can deal with the high expressivity of SPARQL. In particular, we propose a formal model that allow to define queries over the Schema Summary (i.e. Basic Query) and we analyze how this model can handle different join patterns used in SPARQL queries. Finally, we inspect how LODeX can satisfy real world users necessities by analyzing the query logs contained in the LSQ dataset. We show that LODeX could be able to generate the 77.6% of the 5 million queries contained in LSQ dataset

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Extraction of Informations From Highly Heterogeneous Source of Textual Data

Author: Bergamaschi S.
Sonia Bergamaschi
Publication venue
Publication date: 01/01/1997
Field of study

. Extracting informations from multiple sources, highly heterogeneous, of textual data and integrating them in order to provide true information is a challenging research topic in the database area. In order to illustrate problems and solutions, one of the most interesting projects facing this problem, TSIMMIS, is presented. Furthermore, a Description Logics approach, able to provide interesting solutions both for data integration and data querying, is introduced. 1 Introduction The availiability of large numbers of network informations sources (and the recent explosion of Internet) makes it possible to access to a very large amount of information sources all over the world. The increased amount of available informations has as a consequence the fact that, for a given query, the set of potentially interesting sites is very high but only very few sites are really relevant. Furthermore, informations are highly heterogeneous both in their structure and in their origin. In particular, n..

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

The E/S knowledge representation system

Author: Sartori C.
Bergamaschi S.
Lodi S.
Publication venue
Publication date: 01/01/1994
Field of study

This paper introduces the E/S knowledge representation model and describes a system based on that model. The model takes ideas from KL-ONE and ER, and its main strength is the direct representation of n-ary relationships. The system is classification-based, and therefore organizes its knowledge in hierarchies of structured intensional objects and offers a set of services to reason about intensional objects, to store extensional objects and to make inferences on the stored knowledge. © 1994

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Towards declarative imperative data-parallel systems ?

Author: Bergamaschi S.
Interlandi M.
Simonini G.
Publication venue
Publication date: 01/01/2014
Field of study

Pushed by recent evolvements in the field of declarative networking and data-parallel computation, we propose a first investigation over a declarative imperative parallel programming model which tries to combine the two worlds. We identify a set of requirements that the model should possess and introduce a conceptual sketch of the system implementing the foresaw model

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

A semantic multi-lingual method for publishing linked open data

Author: Bergamaschi S.
Fusari E.
Sorrentino S.
Publication venue
Publication date: 01/01/2013
Field of study

Nowadays, there has been an increment of open data initiatives promoting the freely publication of data produced by public administrations (such as public spending, health care, education etc.). However, the great majority of these data are published in an unstructured format (such as spreadsheets or CSV) and is typically accessed only by closed communities. To address this problem, we propose a semiautomatic multi-lingual and semantic method for facilitating resource providers in publishing public data into the Linked Open Data (LOD) cloud, and for helping consumers (companies and citizens) in efficiently accessing and querying them. The method has been applied on a real case on a set of data provided in Italian

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Entity resolution on camera records without machine learning

Author: Bergamaschi S.
Zecchini L.
Simonini G.
Publication venue
Publication date: 01/01/2020
Field of study

This paper reports the runner-up solution to the ACM SIGMOD 2020 programming contest, whose target was to identify the specifications (i.e., records) collected across 24 e-commerce data sources that refer to the same real-world entities. First, we investigate the machine learning (ML) approach, but surprisingly find that existing state-of-the-art ML-based methods fall short in such a context-not reaching 0.49 F-score. Then, we propose an efficient solution that exploits annotated lists and regular expressions generated by humans that reaches a 0.99 F-score. In our experience, our approach was not more expensive than the dataset labeling of match/non-match pairs required by ML-based methods, in terms of human efforts

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

DEXA 2008: Second international workshop on Semantic Web Architectures for Enterprises - SWAE'08

Author: Bergamaschi S.
Velegrakis Y.
Guerra F.
Publication venue
Publication date: 01/01/2008
Field of study

The aim of the second edition of the workshop on Semantic Web Architectures for Enterprises (SWAE) is to evaluate how and how much the Semantic Web vision has met its promises with respect to business and market needs. On the basis of our research experience within the basic research Italian project NeP4B (http://www.dbgroup.unimo.it/nep4b/it/index.htm), the European projects SEWASIE (www.sewasie.org), STASIS (http://www.dbgroup.unimo.it/stasis/), OKKAM (www.okkam.org) and Papyrus (www.ict-papyrus.eu), we focus on the permeation of the Semantic Web technologies in industrial and real applications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Lecture Notes in Artificial Intelligence: Preface

Author: Bergamaschi S.
Moro G.
Aberer K.
Publication venue
Publication date: 01/01/2005
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia