1,720,990 research outputs found
Demographically-Inspired Query Variants Using an LLM
This study proposes a method to diversify queries in existing test collections to reflect some of the diversity of search engine users, aligning with an earlier vision of an 'ideal' test collection. A Large Language Model (LLM) is used to create query variants: alternative queries that have the same meaning as the original. These variants represent user profiles characterised by different properties, such as language and domain proficiency, which are known in the Information Retrieval (IR) literature to influence query formulation. The LLM's ability to generate query variants that align with user profiles is empirically validated, and the variants' utility is further explored for IR system evaluation. Results demonstrate that the variants impact how systems are ranked and show that user profiles experience significantly different levels of system effectiveness. This method enables an alternative perspective on system evaluation where we can observe both the impact of user profiles on system rankings and how system performance varies across users
Self-labeling methods for unsupervised transfer ranking
A lack of reliable relevance labels for training ranking functions is a significant problem for many search applications. Transfer ranking is a technique aiming to transfer knowledge from an existing machine learning ranking task to a new ranking task. Unsupervised transfer ranking is a special case of transfer ranking where there aren't any relevance labels available for the new task, only queries and retrieved documents. One approach to tackling this problem is to impute relevance labels for (document-query) instances in the target collection. This is done by using knowledge from the source collection. We propose three self-labeling methods for unsupervised transfer ranking: an expectation-maximization based method (RankPairwiseEM) for estimating pairwise preferences across documents, a hard-assignment expectation-maximization based algorithm (RankHardLabelEM), which directly assigns imputed relevance labels to documents, and a self-learning algorithm (RankSelfTrain), which gradually increases the number of imputed labels. We have compared the three algorithms on three large public test collections using LambdaMART as the base ranker and found that (i) all the proposed algorithms show improvements over the original source ranker in different transferring scenarios; (ii) RankPairwiseEM and RankSelfTrain significantly outperform the source rankers across all environments. We have also found that they are not significantly worse than the model directly trained on the target collection; and (iii) self-labeling methods are significantly better than previous instance-weighting based solutions on a variety of collections
On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluation
Relevance is a key concept in information retrieval and widely used for the evaluation of search systems using test collections. We present a comprehensive study of the effect of the choice of relevance scales on the evaluation of information retrieval systems. Our work analyzes and compares four crowdsourced scales (2-levels, 4-levels, and 100-levels ordinal scales, and a magnitude estimation scale) and two expert-labeled datasets (on 2- and 4-levels ordinal scales). We compare the scales considering internal and external agreement, the effect on IR evaluation both in terms of system effectiveness and topic ease, and we discuss the effect of such scales and datasets on the perception of relevance levels by assessors. Our analyses show that: crowdsourced judgment distributions are consistent across scales, both overall and at the per-topic level; on all scales crowdsourced judgments agree with the expert judgments, and overall the crowd assessors are able to express reliable relevance judgments; all scales lead to a similar level of external agreement with the ground truth, while the internal agreement among crowd workers is higher for fine-grained scales; more fine-grained scales consistently lead to higher correlation values for both system ranking and topic ease; finally, we found that the considered scales lead to different perceived distances between relevance levels
Towards Nuanced System Evaluation Based on Implicit User Expectations
Information retrieval systems are often evaluated through the use of effectiveness metrics. In the past, the metrics used have corresponded to fixed models of user behavior, presuming, for example, that the user will view a pre-determined number of items in the search engine results page, or that they have a constant probability of advancing from one item in the result page to the next. Recently, a number of proposals for models of user behavior have emerged that are parameterized in terms of the number of relevant documents (or other material) a user expects to be required to address their information need. That recent work has demonstrated that T, the user’s a priori utility expectation, is correlated with the underlying nature of the information need; and hence that evaluation metrics should be sensitive to T. Here we examine the relationship between the query the user issues, and their anticipated T, seeking syntactic and other clues to guide the subsequent system evaluation. That is, we wish to develop mechanisms that, based on the query alone, can be used to adjust system evaluations so that the experience of the user of the system is better captured in the system’s effectiveness score, and hence can be used as a more refined way of comparing systems. This paper reports on a first round of experimentation, and describes the progress (albeit modest) that we have achieved towards that goal
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
