Search CORE

1,721,018 research outputs found

A unifying framework for l0-sampling algorithms

Author: Cormode Graham
Firmani Donatella
Publication venue
Publication date: 01/01/2014
Field of study

The problem of building an l 0-sampler is to sample near-uniformly from the support set of a dynamic multiset. This problem has a variety of applications within data analysis, computational geometry and graph algorithms. In this paper, we abstract a set of steps for building an l 0-sampler, based on sampling, recovery and selection. We analyze the implementation of an l 0-sampler within this framework, and show how prior constructions of l 0-samplers can all be expressed in terms of these steps. Our experimental contribution is to provide a first detailed study of the accuracy and computational cost of l 0-samplers. © 2013 Springer Science+Business Media New York

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

The (not so) critical nodes of criminal networks

Author: Firmani Donatella
Italiano Giuseppe F.
Publication venue
Publication date: 01/01/2015
Field of study

One of the most basic question in the analysis of social networks is to find nodes that are of particular relevance in the network. The answer that emerged in the recent literature is that the importance, or centrality, of a node x is proportional to the number of nodes that get disconnected from the network when node x is removed. We show that while in social networks such important nodes lie in their cores (i.e., maximal subgraphs in which all nodes have degree higher than a certain value), this is not necessarily the case in criminal networks. This shows that nodes whose removal affects large portions of the criminal network prefer to operate from network peripheries, thus confirming the intuition of Baker and Faulkner [4]. Our results also highlight structural differences between criminal networks and other social networks, suggesting that classical definitions of importance (or centrality) in a network fail to capture the concept of key players in criminal networks

Archivio della Ricerca - Università di Roma 3

Online entity resolution using an oracle

Author: Saha Barna
Firmani Donatella
Srivastava Divesh
Publication venue
Publication date: 01/01/2016
Field of study

Entity resolution (ER) is the task of identifying all records in adatabase that refer to the same underlying entity. This is an expensivetask, and can take a significant amount of money and time; theend-user may want to take decisions during the process, rather thanwaiting for the task to be completed. We formalize an online versionof the entity resolution task, and use an oracle which correctlylabels matching and non-matching pairs through queries. In thissetting, we design algorithms that seek to maximize progressive recall,and develop a novel analysis framework for prior proposalson entity resolution with an oracle, beyond their worst case guarantees.Finally, we provide both theoretical and experimental analysisof the proposed algorithms

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

On the Meaningfulness of “Big Data Quality” (Invited Paper)

Author: Mecella Massimo
Scannapieco Monica
Batini Carlo
Firmani Donatella
FIRMANI DONATELLA
Carlo Batini
SCANNAPIECO Monica
Donatella Firmani
MECELLA Massimo
BATINI Carlo
Monica Scannapieco
Massimo Mecella
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we discuss the application of concept of data quality to big data by highlighting how much complex is to define it in a general way. Already data quality is a multidimensional concept, difficult to characterize in precise definitions even in the case of well-structured data. Big data add two further dimensions of complexity: (i) being “very” source specific, and for this we adopt the interesting UNECE classification, and (ii) being highly unstructured and schema-less, often without golden standards to refer to or very difficult to access. After providing a tutorial on data quality in traditional contexts, we analyze big data by providing insights into the UNECE classification, and then, for each type of data source, we choose a specific instance of such a type (notably deep Web data, sensor-generated data, and Twitters/short texts) and discuss how quality dimensions can be defined in these cases. The overall aim of the paper is therefore to identify further research directions in the area of big data quality, by providing at the same time an up-to-date state of the art on data quality. © 2015, The Author(s)

Crossref

Springer - Publisher Connector

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

Cliques are Too Strict for Representing Communities: Finding Large k-plexes in Real Networks

Author: Conte Alessio
Mordente Caterina
Patrignani Maurizio
Firmani Donatella
Torlone Riccardo
Publication venue
Publication date: 01/01/2018
Field of study

k-plexes are a formal yet flexible way of defining communities in networks. They generalize the notion of cliques and are more appropriate in most real cases: while a node of a clique C is connected to all other nodes of C, a node of a k-plex may miss up to k connections. Unfortunately, computing all maximal k-plexes is a gruesome task and state-of-the-art algorithms can only process small-size networks. In this paper we propose a new approach for enumerating large k-plexes in networks that speeds up the search by several orders of magnitude, leveraging on efficient techniques for the computation of maximal cliques

Archivio della Ricerca - Università di Pisa

In Codice Ratio: Machine Transcription of Medieval Manuscripts

Author: Firmani Donatella
Nieddu Elena
Merialdo Paolo
Maiorino Marco
Publication venue
Publication date: 01/01/2019
Field of study

Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors

Archivio della Ricerca - Università di Roma 3

Computer Science Foundations for Digital Libraries: Algorithms, Systems, and Applications

Author: Silvello Gianmaria
Mizzaro Stefano
Firmani Donatella
Tonelli Sara
Portelli Beatrice
Publication venue
Publication date: 01/01/2025
Field of study

Digital libraries face challenges in quality, accessibility, and usage of resources. This issue presents seven papers offering computational and technical solutions to these problems: data quality through validation and monitoring, AI evaluation of information systems, and enhanced content discoverability. Research also covers knowledge representation with new provenance models, deep learning for bibliographic control, metadata-driven access to underrepresented languages, and computational methods for restoring historical documents. These papers showcase how modern techniques like machine learning, semantic web technologies, knowledge graphs, and image processing tackle digital library challenges, improving resource quality and accessibility. These papers were selected from the 21st Italian Research Conference on Digital Libraries (IRCDL 2025), held in Udine, Italy, on 20–21 February 2025, which has served since 2005 as a key annual forum bringing together researchers from academia, government, and industry to address topics spanning computer science, digital humanities, information science, librarianship, archival science, museum studies, and cultural heritage

Archivio della ricerca - Fondazione Bruno Kessler

Biconnectivity of social and web graphs

Author: Ferraro-Petrillo Umberto
Firmani Donatella
Laura Luigi
Chiapparo Giuseppe
Publication venue
Publication date: 01/01/2016
Field of study

Network connectivity is a basic notion for big data analysis, with many practical applications. In particular, the analysis of the largest strongly connected component is at the heart of the famous bow-tie structure of the Web. In this paper, we go beyond connected components and characterize the properties of the higher-connectivity components that lie inside real-world networks. We combine and extend recently developed techniques for scaling up to real-world datasets with up to 23.1 billion edges. Our experiments point out a striking difference between structural properties in social and web graphs, and highlight deep properties of social networks which may be helpful in devising theoretical models that can better capture their structure

Archivio della ricerca- Università di Roma La Sapienza

On unifying the space of l0-sampling algorithms

Author: Cormode Graham
Graham Cormode
Firmani Donatella
Donatella Firmani
Publication venue
Publication date: 01/01/2013
Field of study

The problem of building an l0-sampler is to sample nearuniformly from the support set of a dynamic multiset. This problem has a variety of applications within data analysis, computational geometry and graph algorithms. In this paper, we abstract a set of steps for building an l0-sampler, based on sampling, recovery and selection. We analyze the implementation of an l0-sampler within this framework, and show how prior constructions of l0-samplers can all be expressed in terms of these steps. Our experimental contribution is to provide a first detailed study of the accuracy and computational cost of l0-samplers

Crossref

Archivio della Ricerca - Università di Roma 3

Real-time anomalies detection and analysis of network structure, with application to the Autonomous System network

Author: Laura Luigi
Ausiello Giorgio
Firmani Donatella
Publication venue
Publication date: 01/01/2011
Field of study

The structural analysis is the very basic tool for understanding the properties of a network. In this paper we present a (customizable) tool, able to compute in real-time the most important connectivity properties of a network, modeled as an undirected graph: connected and biconnected components, articulation points and bridges. The algorithm underlying the tool has been theoretically analyzed in the (semi-)streaming model, and has been tested with graphs up to hundreds of millions nodes and billions edges. The tool, therefore, can be employed to monitor traffic flows in medium and large networks, at real-time, and detect possible anomalies. As an application, we provide results about the structural properties of ten years of samples of the Autonomous System network, obtained from the Univ. of Oregon Route Views project [1], that (once again) shows the ubiquitous presence of power-law distribution. © 2011 IEEE

Archivio della Ricerca - Università di Roma 3