1,721,296 research outputs found

    Using an inverted index synopsis for query latency and performance prediction

    Full text link
    Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index

    Streamlining Evaluation with ir-measures

    Full text link
    We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations, ir-measures provides a common interface to a handful of evaluation tools. The necessary tools are automatically invoked (potentially multiple times) to calculate all the desired metrics, simplifying the evaluation process for the user. The tool also makes it easier for researchers to use recently-proposed measures (such as those from the C/W/L framework) alongside traditional measures, potentially encouraging their adoption

    Reproducing Personalised Session Search over the AOL Query Log

    Full text link
    Despite its troubled past, the AOL Query Log continues to be an important resource to the research community—particularly for tasks like search personalisation. When using the query log these ranking experiments, little attention is usually paid to the document corpus. Recent work typically uses a corpus containing versions of the documents collected long after the log was produced. Given that web documents are prone to change over time, we study the differences present between a version of the corpus containing documents as they appeared in 2017 (which has been used by several recent works) and a new version we construct that includes documents close to as they appeared at the time the query log was produced (2006). We demonstrate that this new version of the corpus has a far higher coverage of documents present in the original log (93%) than the 2017 version (55%). Among the overlapping documents, the content often differs substantially. Given these differences, we re-conduct session search experiments that originally used the 2017 corpus and find that when using our corpus for training or evaluation, system performance improves. We place the results in context by introducing recent adhoc ranking baselines. We also confirm the navigational nature of the queries in the AOL corpus by showing that including the URL substantially improves performance across a variety of models. Our version of the corpus can be easily reconstructed by other researchers and is included in the ir-datasets package

    Effective Rating Prediction Using an Attention-Based User Review Sentiment Model

    Full text link
    We propose a new sentiment information-based attention mechanism that helps to identify user reviews that are more likely to enhance the accuracy of a rating prediction model. We hypothesis that highly polarised reviews (strongly positive or negative) are better indicators of the users’ preferences and that this sentiment polarity information helps to identify the usefulness of reviews. Hence, we introduce a novel neural network rating prediction model, called SentiAttn, which includes both the proposed sentiment attention mechanism as well as a global attention mechanism that captures the importance of different parts of the reviews. We show how the concatenation of the positive and negative users’ and items’ reviews as input to SentiAttn, results in different architectures with various channels. We investigate if we can improve the performance of SentiAttn by fine-tuning different channel setups. We examine the performance of SentiAttn on two well-known datasets from Yelp and Amazon. Our results show that SentiAttn significantly outperforms a classical approach and four state-of-the-art rating prediction models. Moreover, we show the advantages of using the sentiment attention mechanism in the rating prediction task and its effectiveness in addressing the cold-start problem

    Efficient & Effective Selective Query Rewriting with Efficiency Predictions

    Full text link
    To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine

    Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items

    Full text link
    Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models’ memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50 ×, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. Upon analysing RecJPQ’s scoring algorithm, we find that its efficiency is limited by its use of score accumulators for each item, which prevents parallelisation. In contrast, LightRec (a non-sequential method that uses a similar idea of sub-ids) reported large inference efficiency improvements using an algorithm we call PQTopK. We show that it is also possible to improve RecJPQ-based models’ inference efficiency using the PQTopK algorithm. In particular, we speed up RecJPQ-enhanced SASRec by a factor of 4.5 × compared to the original SASRec’s inference method and by the factor of 1.56 × compared to the method implemented in RecJPQ code on a large-scale Gowalla dataset with more than million items. Further, using simulated data, we show that PQTopK remains efficient with catalogues of up to tens of millions of items, removing one of the last obstacles to using Transformer-based models in production environments with large catalogues

    Intensification and complexity in teachers’ narrated worklives

    No full text
    Reflecting on a previous study of teachers’ narratives, this epistolary conversation follows ideas of intensification and complexity that emerged in the authors’ return to the narrative accounts. Their conversation highlights representations of teaching as a struggle for recognition, personal happiness, and security—all within a system of accountability. Of central concern is the concept of complicity and how it is related to the seduction of consent through which teachers encounter a discourse of professionalism. By way of countering a misrecognized professionalism, the authors suggest that teachers’ narrative writings can be a means of forming a critical stance

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Knowledge Graph Cross-View Contrastive Learning for Recommendation

    Full text link
    Knowledge Graphs (KGs) are useful side information that help recommendation systems improve recommendation quality by providing rich semantic information about entities and items. Recently, models based on graph neural networks (GNNs) have adopted knowledge graphs to capture further high-order structural information, such as shared preferences between users and similarities between items. However, existing GNN-based methods suffer from two challenges: (1) Sparse supervisory signal, where a large amount of information in the knowledge graph is non-relevant to recommendation, and the training labels are insufficient, thereby limiting the recommendation performance of the trained model; (2) Valuable information is discarded whereby the use by the existing models of edge or node dropout strategies to obtain augmented views during self-supervised learning could lead to valuable information being discarded in recommendation. These two challenges limit the effective representation of users and items by existing methods. Inspired by self-supervised learning to mine supervision signals from data, in this paper, we focus on exploring contrastive learning based on knowledge graph enhancement, and propose a new model named Knowledge Graph Cross-view Contrastive Learning for Recommendation (KGCCL) to address the two challenges. Specifically, to address supervision sparseness, we perform contrastive learning between graph views at different levels and mine graph feature information in a self-supervised learning manner. In addition, we use noise augmentation to enhance the representation of users and items, while retaining all triplet information in the knowledge graph to address the challenge of valuable information being discarded. Experimental results on three public datasets show that our proposed KGCCL model outperforms existing state-of-the-art methods. In particular, our model outperforms the best baseline performance by 10.65% on the MIND dataset
    corecore