Search CORE

1,721,002 research outputs found

Accounting for quality in data integration systems: a completeness-aware integration approach

Author: Di Leo Simone
Scannapieco Monica
Daraio Cinzia
Publication venue
Publication date: 01/01/2022
Field of study

Ensuring the quality of integrated data is undoubtedly one of the main problems of integrated data systems. When focusing on multi-national and historical data integration systems, where the “space” and “time” dimensions play a relevant role, it is very much important to build the integration layer in such a way that the final user accesses a layer that is “by design” as much complete as possible. In this paper, we propose a method for accessing data in multipurpose data infrastructures, like data integration systems, which has the properties of (i) relieving the final user from the need to access single data sources while, at the same time, (ii) ensuring to maximize the amount of the information available for the user at the integration layer. Our approach is based on a completeness-aware integration approach which allows the user to have ready available all the maximum information that can get out of the integrated data system without having to carry out the preliminary data quality analysis on each of the databases included in the system. Our proposal of providing data quality information at the integrated level extends then the functions of the individual data sources, opening the data infrastructure to additional uses. This may be a first step to move from data infrastructures towards knowledge infrastructures. A case study on the research infrastructure for the science and innovation studies shows the usefulness of the proposed approach

Archivio della ricerca- Università di Roma La Sapienza

On the Meaningfulness of “Big Data Quality” (Invited Paper)

Author: Mecella Massimo
Scannapieco Monica
Batini Carlo
Firmani Donatella
FIRMANI DONATELLA
Carlo Batini
SCANNAPIECO Monica
Donatella Firmani
MECELLA Massimo
BATINI Carlo
Monica Scannapieco
Massimo Mecella
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we discuss the application of concept of data quality to big data by highlighting how much complex is to define it in a general way. Already data quality is a multidimensional concept, difficult to characterize in precise definitions even in the case of well-structured data. Big data add two further dimensions of complexity: (i) being “very” source specific, and for this we adopt the interesting UNECE classification, and (ii) being highly unstructured and schema-less, often without golden standards to refer to or very difficult to access. After providing a tutorial on data quality in traditional contexts, we analyze big data by providing insights into the UNECE classification, and then, for each type of data source, we choose a specific instance of such a type (notably deep Web data, sensor-generated data, and Twitters/short texts) and discuss how quality dimensions can be defined in these cases. The overall aim of the paper is therefore to identify further research directions in the area of big data quality, by providing at the same time an up-to-date state of the art on data quality. © 2015, The Author(s)

Crossref

Springer - Publisher Connector

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

Groupware mail messages analysis for mining collaborative processes (poster paper)

Author: Mecella Massimo
Scannapieco Monica
Zardetto Diego
Di Ciccio Claudio
Publication venue
Publication date: 01/01/2011
Field of study

Nowadays, the most of the research related to workflows has considered the management of formal business processes. There has been some discussion of informal processes, often under names such as “artful business processes”: informal processes are typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, etc.), the so called “knowledge workers”. With their skills, experience and knowledge, they are used to perform difficult tasks, which require complex, rapid decisions among multiple possible strategies, in order to fulfill specific goals. In contrast to business processes that are formal and standardized, often informal processes are not even written down, let alone defined formally, and can vary from person to person even when those involved are pursuing the same objective. Knowledge workers create informal processes “on the fly” to cope with many of the situations that arise in their daily work. While informal processes are frequently repeated, since they are not written down, they are not exactly reproducible, even by their originators, nor can they be easily shared. Their outcome releases and their information exchanges are very often done by means of e-mail conversations, which are a fast, reliable, permanent way of keeping track of the activities that they fulfill. The objective of the research proposed in this position document is to automatically build, on top of a collection of e-mails, a set of workflow models that represent the artful processes which lay behind the knowledge workers activities

Archivio della ricerca- Università di Roma La Sapienza

Data Quality in Cooperative Information Systems

Author: VIRGILLITO A.
MARCHETTI C
SCANNAPIECO Monica
MECELLA Massimo
Publication venue
Publication date: 01/01/2005
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Integrating microdata on Higher Education Institutions (HEIs) with bibliometric and contextual variables: A data quality approach

Author: GENTILI ANGELO
Scannapieco Monica
DARAIO CINZIA
Publication venue
Publication date: 01/01/2015
Field of study

[No abstract available

Archivio della ricerca- Università di Roma La Sapienza

OntoPIM: how to rely on a personal ontology for Personal Information Management

Author: KATIFORI V
POGGI Antonella
IOANNIDIS Y.
CATARCI Tiziana
SCANNAPIECO Monica
Publication venue
Publication date: 01/01/2005
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Enabling Data Quality Notification in Cooperative Information Systems through a Web-Service Based Architecture

Author: MARCHETTI C
SCANNAPIECO Monica
MECELLA Massimo
VIRGILLITO V.
Publication venue
Publication date: 01/01/2003
Field of study

Archivio della ricerca- Università di Roma La Sapienza

My (Fair) Big Data

Author: Demetrescu Camil
Camil Demetrescu
Catarci Tiziana
Scannapieco Monica
Marco Console
Tiziana Catarci
Console Marco
Monica Scannapieco
Publication venue
Publication date: 01/01/2017
Field of study

Policy making has the strict requirement to rely on quantitative and high quality information. This paper will address the data quality issue for policy making by showing how to deal with Big Data quality in the different steps of a processing pipeline, with a focus on the integration of Big Data sources with traditional sources. In this respect, a relevant role is played by metadata and in particular by ontologies. Integration systems relying on ontologies enable indeed a formal quality evaluation of inaccuracy, inconsistency and incompleteness of integrated data. The paper will finally describe data confidentiality as a Big Data quality dimension, showing the main issues to be faced for its assurance

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Author: Pugliese Francesco
Bernasconi Eleonora
Scannapieco Monica
Francesco Pugliese
Zardetto Diego
Diego Zardetto
Monica Scannapieco
Eleonora Bernasconi
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we address the challenge of land cover classification for satellite images via Deep Learning (DL). Land Cover aims to detect the physical characteristics of the territory and estimate the percentage of land occupied by a certain category of entities: vegetation, residential buildings, industrial areas, forest areas, rivers, lakes, etc. DL is a new paradigm for Big Data analytics and in particular for Computer Vision. The application of DL in images classification for land cover purposes has a great potential owing to the high degree of automation and computing performance. In particular, the invention of Convolution Neural Networks (CNNs) was a fundament for the advancements in this field. In [1], the Satellite Task Team of the UN Global Working Group describes the results achieved so far with respect to the use of earth observation for Official Statistics. However, in that study, CNNs have not yet been explored for automatic classification of imagery. This work investigates the usage of CNNs for the estimation of land cover indicators, providing evidence of the first promising results. In particular, the paper proposes a customized model, called Satellite-Net, able to reach an accuracy level up to 98% on test sets

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Measuring Information Quality on the Internet - a User Perspective

Author: Myrach Thomas
Haupt Patrizia Salomé
Blattmann Olivier
Kaltenrieder Patrick
Publication venue
Publication date: 17/11/2012
Field of study

Research into information quality on the internet, in particular on websites, has become increasingly important in recent years. In this paper a research project is described in which a measurement instrument was developed that enables the information quality of websites to be determined and analyzed from the customer perspective. The measurement instrument was developed in several stages and on the basis of a methodical-theoretical approach. In a first step, previous research results and measurement instruments were systematically analyzed. In a second step, these results were adjusted and supplemented on the basis of a qualitative study. A quantitative test of the measurement instrument is planned

BORIS Portal Bern Open Repository and Information System