Search CORE

1,721,072 research outputs found

A Dependency-Aware Utterances Permutation Strategy to Improve Conversational Evaluation

Author: Tonellotto Nicola
Ferrante Marco
Perego Raffaele
Faggioli Guglielmo
Ferro Nicola
Publication venue
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Università di Padova

Efficient query recommendations in the long tail via center-piece subgraphs

Author: VENTURINI ROSSANO
Silvestri Fabrizio
Bonchi Francesco
Perego Raffaele
Vahabi Hossein
Publication venue
Publication date: 01/01/2012
Field of study

Archivio della Ricerca - Università di Pisa

Recommendations for the long tail by term-query graph

Author: VENTURINI ROSSANO
Silvestri Fabrizio
Bonchi Francesco
Perego Raffaele
Vahabi Hossein
Publication venue
Publication date: 01/01/2011
Field of study

We define a new approach to the query recommendation problem. In particular, our main goal is to design a model enabling the generation of query suggestions also for rare and previously unseen queries. In other words we are targeting queries in the long tail. The model is based on a graph having two sets of nodes: Term nodes, and Query nodes. The graph induces a Markov chain on which a generic random walker starts from a subset of Term nodes, moves along Query nodes, and restarts (with a given probability) only from the same initial subset of Term nodes. Computing the stationary distribution of such a Markov chain is equivalent to extracting the so-called Center-piece Subgraph from the graph associated with the Markov chain itself. Given a query, we extract its terms and we set the restart subset to this term set. Therefore, we do not require a query to have been previously observed for the recommending model to be able to generate suggestions

Archivio della Ricerca - Università di Pisa

SE-PQA: Personalized Community Question Answering

Author: Kasela Pranav
Braga Marco
Perego Raffaele
Pasi Gabriella
Publication venue
Publication date: 01/01/2024
Field of study

Personalization in Information Retrieval is a topic studied for a long time. Nevertheless, there is still a lack of high-quality, real-world datasets to conduct large-scale experiments and evaluate models for personalized search. This paper contributes to filling this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated resource to design and evaluate personalized models related to the task of community Question Answering (cQA). The contributed dataset includes more than 1 million queries and 2 million answers, annotated with a rich set of features modeling the social interactions among the users of a popular cQA platform. We describe the characteristics of SE-PQA and detail the features associated with questions and answers. We also provide reproducible baseline methods for the cQA task based on the resource, including deep learning models and personalization approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested. Furthermore, we show the benefits in terms of robustness and generalization of combining data from multiple communities for personalization purposes

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Electoral Predictions with Twitter: A Machine-Learning approach

Author: Lucchese Claudio
Perego Raffaele
Orlando Salvatore
Coletto Mauro
Publication venue
Publication date: 01/01/2015
Field of study

Several studies have shown how to approximately predict public opinion, such as in political elections, by analyzing user activities in blogging platforms and on-line social networks. The task is challenging for several reasons. Sample bias and automatic understanding of textual content are two of several non trivial issues. In this work we study how Twitter can provide some interesting insights concerning the primary elections of an Italian political party. State-of-the-art approaches rely on indicators based on tweet and user volumes, often including sentiment analysis. We investigate how to exploit and improve those indicators in order to reduce the bias of the Twitter users sample. We propose novel indicators and a novel content-based method. Furthermore, we study how a machine learning approach can learn correction factors for those indicators. Experimental results on Twitter data support the validity of the proposed methods and their improvement over the state of the art

ARCA (Univ. Ca'Foscari)

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

IMT Institutional Repository

Preface

Author: Di Noia Tommaso
Perego Raffaele
Crestani Fabio
Publication venue
Publication date: 01/01/2017
Field of study

Politecnio die Bari - Catalogo di prodotti della Ricerca

Query Performance Prediction Using Dimension Importance Estimators

Author: Tonellotto Nicola
Perego Raffaele
Faggioli Guglielmo
Ferro Nicola
Publication venue
Publication date: 01/01/2025
Field of study

Query Performance Prediction (QPP) tends to fall short when predicting the performance of dense Information Retrieval (IR) systems. Therefore, the research community is investigating QPP approaches designed to synergize with this class of state-of-the-art IR models. At the same time, recent advances concerning dense IR have shown that we can improve the retrieval performance by projecting embeddings in a (query-wise) optimal linear subspace of the dense representation space. The Dimension IMportance Estimation (DIME) framework was proposed to identify such optimal subspaces on a query-by-query basis. In this paper, we illustrate how to design QPP models that rely on measuring the alignment between the query and document representations and the optimal DIME dimensions, based on the hypothesis that good alignment indicates better retrieval performance. We experimentally evaluate the proposed QPPs, showing that our approach outperforms the state-of-the-art when predicting the performance of two commonly used dense encoders, Contriever and TAS-B, on two popular TREC collections, Deep Learning 2019 and 2020

Archivio istituzionale della ricerca - Università di Padova

Evaluating Top-K Approximate Patterns via Text Clustering

Author: Lucchese Claudio
Raffaele Perego
Perego Raffaele
Orlando Salvatore
Salvatore Orlando
Claudio Lucchese
Publication venue
Publication date: 01/01/2016
Field of study

This work investigates how approximate binary patterns can be objectively evaluated by using as a proxy measure the quality achieved by a text clustering algorithm, where the document features are derived from such patterns. Specifically, we exploit approximate patterns within the well-known FIHC (Frequent Itemset-based Hierarchical Clustering) algorithm, which was originally designed to employ exact frequent itemsets to achieve a concise and informative representation of text data. We analyze different state-of-the-art algorithms for approximate pattern mining, in particular we measure their ability in extracting patterns that well characterize the document topics in terms of the quality of clustering obtained by FIHC. Extensive and reproducible experiments, conducted on publicly available text corpora, show that approximate itemsets provide a better representation than exact ones

ARCA (Univ. Ca'Foscari)

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Supervised Evaluation of Top-k Itemset Mining Algorithms

Author: Lucchese Claudio
Raffaele Perego
Perego Raffaele
Orlando Salvatore
Salvatore Orlando
Claudio Lucchese
Publication venue
Publication date: 01/01/2015
Field of study

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa+, and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on several datasets from the UCI repository. As contribution, we show that PaNDa+ performs best in the majority of the cases, since the classifiers built over the mined patterns used as dataset features are the most accurate.A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa ^{+}, and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on several datasets from the UCI repository. As contribution, we show that PaNDa ^{+} performs best in the majority of the cases, since the classifiers built over the mined patterns used as dataset features are the most accurate

ARCA (Univ. Ca'Foscari)

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Gossip Communities: Collaborative Filtering Through Peer-to-Peer Overlays

Author: Laura Ricci
Baraglia Ranieri
Mordacchini Matteo
Dazzi Patrizio
Perego Raffaele
Publication venue
Publication date: 01/01/2010
Field of study

Gossip-based Peer-to-Peer protocols proved to be very efficient for supporting dynamic and complex information exchange among distributed peers. They are useful for building and maintaining the net- work topology itself as well as to support a pervasive diusion of the information injected into the network. In this paper, we propose the general architecture of a system that tries to exploit the collaborative exchange of information between peers in order to build a system able to gather similar users and spread useful suggestions among them

Archivio della Ricerca - Università di Pisa