1,720,960 research outputs found

    Efficient Multi-vector Dense Retrieval with Bit Vectors

    No full text
    Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID has tackled these problems by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, thus reducing the cost of the successive ranking stages. This paper proposes “Efficient Multi-Vector dense retrieval with Bit vectors” (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, exploiting SIMD instructions, thus reducing its latency. Third, EMVB leverages Product Quantization (PQ) to reduce the memory footprint of storing vector representations while jointly allowing for fast late interaction. Fourth, we introduce a per-document term filtering method that further improves the efficiency of the last step. Experiments on MS MARCO and LoTTE show that EMVB is up to 2.8× faster while reducing the memory footprint by 1.8× with no loss in retrieval accuracy compared to PLAID

    Efficient and Effective Multi-Vector Dense Retrieval with EMVB

    No full text
    Dense retrieval techniques utilize large pre-trained language models to construct a high-dimensional representation of queries and passages. These representations assess the relevance of a passage concerning a query through efficient similarity measures. Multi-vector representations, while enhancing effectiveness, cause a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. The current state-of-the-art approach, namely PLAID, has introduced a centroid-based term representation to mitigate the memory impact of multi-vector systems. By employing a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of subsequent ranking stages. This paper1 introduces "Efficient Multi-Vector dense retrieval with Bit vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. Firstly, EMVB utilizes an optimized bit vector pre-filtering step for passages, enhancing efficiency. Secondly, the computation of centroid interaction occurs column-wise, leveraging SIMD instructions to reduce latency. Thirdly, EMVB incorporates Product Quantization (PQ) to decrease the memory footprint of storing vector representations while facilitating fast late interaction. Lastly, a per-document term filtering method is introduced, further improving the efficiency of the final step. Experiments conducted on MS MARCO and LoTTE demonstrate that EMVB achieves up to a 2.8× speed improvement while reducing the memory footprint by 1.8×, without compromising retrieval accuracy compared to PLAID

    Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

    Full text link
    Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin

    Distilled Neural Networks for Efficient Learning to Rank: (Extended Abstract)

    No full text
    Recent studies in Learning to Rank (LtR) have shown the possibility of effectively distilling a neural network from an ensemble of regression trees. This fully enables the use of neural-based ranking models in query processors of modern Web search engines. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness on CPU. In this paper, we propose a framework to design and train neural networks outperforming ensembles of regression trees. After distilling the networks from tree-based models, we exploit an efficiency-oriented pruning technique that works by sparsifying the most computationally intensive layers of the model. Moreover, we develop inference time predictors, which help devise neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that the neural networks produced with our novel approach are competitive in terms of effectiveness-efficiency trade-off when compared with tree-based ensembles by providing up to 4x inference time speed-up without degradation of the ranking quality

    Distilled Neural Networks for Efficient Learning to Rank

    Full text link
    Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU. In this paper, we propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication. We employ knowledge distillation to learn shallow neural networks from an ensemble of regression trees. Then, we exploit an efficiency-oriented pruning technique that performs a sparsification of the most computationally-intensive layers of the neural network that is then scored with optimized sparse matrix multiplication. Moreover, by studying both dense and sparse high performance matrix multiplication, we develop a scoring time prediction model which helps in devising neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that neural networks produced with our novel approach are competitive at any point of the effectiveness-efficiency trade-off when compared with tree-based ensembles, providing up to 4x scoring time speed-up without affecting the ranking quality

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore