Search CORE

1,721,216 research outputs found

Efficient Stream Join Processing: Novel Approaches and Challenges

Author: Aslam A.
Simonini G.
Publication venue
Publication date: 01/01/2024
Field of study

Stream join is a fundamental data operator for processing real-time data, but it faces computational challenges during stream inequality join (theta join operators) due to frequent updates in indexing data structures. To tackle this problem, we identify three key insights: 1) identifying skewed data distributions in real-time and implementing dedicated indexing structures for skewed keys to reduce index update costs; 2) leveraging optimized data structures, including insert-efficient mutable and search-efficient immutable structures to optimize the search stream join process and 3) adopting learned indexes instead of conventional ones, which can provide up to 4x better performance.In this Ph.D. work, we propose novel solutions for distributed and multi-core stream join processing, including an indexing solution that uses a space-efficient dedicated filter and a two-stage data structure that effectively holds and processes sliding window items (bounded streaming contents). We are also exploring the adoption and benefits of learned indexes for real-time stream join processing. Despite non-trivial challenges like state management for distributed processing, processing guarantees, and efficient concurrency mechanisms, experiments on distributed stream processing systems show superior performance compared to state-of-the-art solutions

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Enhancing entity resolution efficiency with loosely schema-aware techniques - Discussion paper

Author: Bergamaschi S.
Simonini G.
Publication venue
Publication date: 01/01/2016
Field of study

Entity Resolution, the task of identifying records that refer to the same real-world entity, is a fundamental step in data integration. Blocking is a widely employed technique to avoid the comparison of all possible record pairs in a dataset (an inefficient approach). Renouncing to exploit schema information for blocking has been proved to limit the chance of missing matches (i.e., it guarantees high recall), at the cost of a low precision. Meta-blocking alleviates this issue by restructuring a block collection, removing redundant and superfluous comparisons. Yet, existing meta-blocking techniques exclusively rely on schema-agnostic features. In this paper, we investigate how loose schema information, induced directly from the data, can be exploited in an holistic loosely schema-aware (meta-)blocking approach that outperforms the state-of-the-art meta-blocking in terms of precision, without renouncing high level of recall. We implemented our idea in a system called Blast, and experimentally evaluated it on real-world datasets

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

No evidence yet to change American Heart Association recommendations for poststreptococcal reactive arthritis: comment on the article by van Bemmel et al.

Author: Cimaz R.
TADDIO ANDREA
Simonini G.
Publication venue
Publication date: 01/01/2009
Field of study

Archivio istituzionale della ricerca - Università di Trieste

Towards declarative imperative data-parallel systems ?

Author: Bergamaschi S.
Interlandi M.
Simonini G.
Publication venue
Publication date: 01/01/2014
Field of study

Pushed by recent evolvements in the field of declarative networking and data-parallel computation, we propose a first investigation over a declarative imperative parallel programming model which tries to combine the two worlds. We identify a set of requirements that the model should possess and introduce a conceptual sketch of the system implementing the foresaw model

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

The burden of extracutaneous manifestations in juvenile localized scleroderma: A literature review

Author: Liguoro I.
Martini G.
Simonini G.
Publication venue
Publication date: 01/01/2025
Field of study

Objectives: Juvenile Localized Scleroderma (JLS) is an autoimmune disease leading to fibrosis of skin and subcutaneous tissues affecting children, that is characterized by extracutaneous manifestations (ECM) in about 20 % of patients. JLS and ECM can cause severe disabilities, potentially impacting patients' quality of life (QoL). We aimed to systematically review studies reporting ECM in young patients with JLS. Methods: Pubmed, Cochrane and Scopus databases were approached to identify studies evaluating ECM in children with LS. Selected papers focusing on QoL and multidisciplinary approach were separately analysed. Results: At the end of the selection process, 15 papers (encompassing 3604 children) focused on the description of ECM were included. Overall, ECM were reported in 958/3604 (26.5 %) children, and the 3 most frequent ones were musculoskeletal (24 %), neurological (10.3 %) and odontostomatological (7.6 %). Six papers (435 patients) focusing on QoL in children with JLS resulted comparable. Three studies focusing on the role of a multidisciplinary team in the management of children and adolescents with JLS and ECM were also selected (216 children). Conclusions: Almost one-third of patients with JLS may present several clinical problems other than skin lesions that should be managed by a multidisciplinary team. However, evidence on the efficacy of a multispecialty management is still lacking. The impact of ECM on QoL of these patients may be underestimated, as no specifically developed assessment tool has been applied so far, but recently proposed overall disease severity and disease-specific patient-reported outcome measures may improve the evaluation of this important clinical aspect

Archivio istituzionale della ricerca - Università degli Studi di Udine

Entity resolution on camera records without machine learning

Author: Bergamaschi S.
Zecchini L.
Simonini G.
Publication venue
Publication date: 01/01/2020
Field of study

This paper reports the runner-up solution to the ACM SIGMOD 2020 programming contest, whose target was to identify the specifications (i.e., records) collected across 24 e-commerce data sources that refer to the same real-world entities. First, we investigate the machine learning (ML) approach, but surprisingly find that existing state-of-the-art ML-based methods fall short in such a context-not reaching 0.49 F-score. Then, we propose an efficient solution that exploits annotated lists and regular expressions generated by humans that reaches a 0.99 F-score. In our experience, our approach was not more expensive than the dataset labeling of match/non-match pairs required by ML-based methods, in terms of human efforts

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Schema-agnostic progressive entity resolution

Author: Bergamaschi S.
Simonini G.
Palpanas T.
Papadakis G.
Publication venue
Publication date: 01/01/2019
Field of study

Entity Resolution (ER) is the task of finding entity profiles that correspond to the same real-world entity. Progressive ER aims to efficiently resolve large datasets when limited time and/or computational resources are available. In practice, its goal is to provide the best possible partial solution by approximating the optimal comparison order of the entity profiles. So far, Progressive ER has only been examined in the context of structured (relational) data sources, as the existing methods rely on schema knowledge to save unnecessary comparisons: they restrict their search space to similar entities with the help of schema-based blocking keys (i.e., signatures that represent the entity profiles). As a result, these solutions are not applicable in Big Data integration applications, which involve large and heterogeneous datasets, such as relational and RDF databases, JSON files, Web corpus etc. To cover this gap, we propose a family of schema-agnostic Progressive ER methods, which do not require schema information, thus applying to heterogeneous data sources of any schema variety. First, we introduce two naïve schema-agnostic methods, showing that straightforward solutions exhibit a poor performance that does not scale well to large volumes of data. Then, we propose four different advanced methods. Through an extensive experimental evaluation over 7 real-world, established datasets, we show that all the advanced methods outperform to a significant extent both the naïve and the state-of-the-art schema-based ones. We also investigate the relative performance of the advanced methods, providing guidelines on the method selection

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Esperienza del monitoraggio di Legionella pneumophila in un ospedale spezzino e misure di prevenzione

Author: LANDI A
GRILLO C
SIMONINI G.
CARDUCCI ANNALAURA
Publication venue
Publication date: 01/01/1999
Field of study

Archivio della Ricerca - Università di Pisa