1,721,045 research outputs found
INTERGENERATIONAL MOBILITY and SOCIAL STATUS in A MODEL with HUMAN CAPITAL INVESTMENTS and TRAIT INHERITANCE
We study a model in which parents care about the economic and social status of their offspring. The chances of an individual achieving social status depends on innate traits, that is, IQ, ability, social and cultural environment, and other price-insensitive endowments, passed on by their parents, on human capital investments and on chance events. Parents can, through human capital investments, increase the offspring's probability of climbing the social ladder, although they cannot borrow against the children's perspective earning. Consequently, income and trait heterogeneity are the determinants of unequal opportunities and of intergenerational mobility
ReNeuIR: Reaching Efficiency in Neural Information Retrieval
Perhaps the applied nature of information retrieval research goes some way to explain the community's rich history of evaluating machine learning models holistically, understanding that efficacy matters but so does the computational cost incurred to achieve it. This is evidenced, for example, by more than a decade of research on efficient training and inference of large decision forest models in learning-to-rank. As the community adopts even more complex, neural network-based models in a wide range of applications, questions on efficiency have once again become relevant. We propose this workshop as a forum for a critical discussion of efficiency in the era of neural information retrieval, to encourage debate on the current state and future directions of research in this space, and to promote more sustainable research by identifying best practices in the development and evaluation of neural models for information retrieval
Efficient and Effective Multi-Vector Dense Retrieval with EMVB
Dense retrieval techniques utilize large pre-trained language models to construct a high-dimensional representation of queries and passages. These representations assess the relevance of a passage concerning a query through efficient similarity measures. Multi-vector representations, while enhancing effectiveness, cause a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. The current state-of-the-art approach, namely PLAID, has introduced a centroid-based term representation to mitigate the memory impact of multi-vector systems. By employing a centroid interaction mechanism, PLAID filters out non-relevant documents, reducing the cost of subsequent ranking stages. This paper1 introduces "Efficient Multi-Vector dense retrieval with Bit vectors" (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. Firstly, EMVB utilizes an optimized bit vector pre-filtering step for passages, enhancing efficiency. Secondly, the computation of centroid interaction occurs column-wise, leveraging SIMD instructions to reduce latency. Thirdly, EMVB incorporates Product Quantization (PQ) to decrease the memory footprint of storing vector representations while facilitating fast late interaction. Lastly, a per-document term filtering method is introduced, further improving the efficiency of the final step. Experiments conducted on MS MARCO and LoTTE demonstrate that EMVB achieves up to a 2.8× speed improvement while reducing the memory footprint by 1.8×, without compromising retrieval accuracy compared to PLAID
Collaborative ranking of grid-enabled workflow service providers
Service Oriented Architecture (SOA) and Grid computing are very hot research topics, nowadays. While Grid computing is aimed at sharing dynamically heterogeneous resources, SOAs is a meta-architectural style that enable business flexibility in an interoperable way. There is a growing consensus that SOA(s) and Grid(s) might be beneficial to each other. In Grid-based SOAs a central role is played by tools for the publishing/discovering of services (resources). This work presents SPRanker (Service Provider Ranker): a service discovery tool that is able to retrieve providers from partially specified service descriptions. It ranks providers found on the basis of an Information Retrieval-based score formula that takes into account judgments expressed collaboratively bv nast service users
Efficient Multi-vector Dense Retrieval with Bit Vectors
Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query latency by encoding queries and documents on a per-token level. Recently, PLAID has tackled these problems by introducing a centroid-based term representation to reduce the memory impact of multi-vector systems. By exploiting a centroid interaction mechanism, PLAID filters out non-relevant documents, thus reducing the cost of the successive ranking stages. This paper proposes “Efficient Multi-Vector dense retrieval with Bit vectors” (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval. First, EMVB employs a highly efficient pre-filtering step of passages using optimized bit vectors. Second, the computation of the centroid interaction happens column-wise, exploiting SIMD instructions, thus reducing its latency. Third, EMVB leverages Product Quantization (PQ) to reduce the memory footprint of storing vector representations while jointly allowing for fast late interaction. Fourth, we introduce a per-document term filtering method that further improves the efficiency of the last step. Experiments on MS MARCO and LoTTE show that EMVB is up to 2.8× faster while reducing the memory footprint by 1.8× with no loss in retrieval accuracy compared to PLAID
A fast ligament model with scalable accuracy for multibody simulations
Multibody musculoskeletal models are important tools to perform kinematic, kinetostatic, and dynamic analyses of the whole human body. In these models, bones are regarded as rigid bodies, while different strategies are used to model structures such as muscles and ligaments. In this context, ligaments are often represented using a finite set of spring-like elements to compute the wrench applied to the bones (multibundle model). While this model is fast and easy to be implemented, it can suffer from inaccuracies due to the limited number of fibers and their positioning. In this study, a ligament model is proposed to overcome these limitations, representing the ligament as an infinite distribution of fibers from which the wrench on the bones can be obtained. The model takes advantage of thin-plate spline mapping to model the fiber structure of the ligament by defining a correspondence between the points of the two ligament insertions. The accuracy and the performances of the model are verified on a ligament and compared to the standard multibundle model. Results indicate that the model is faster and more accurate than the multibundle model. Moreover, accuracy can be modified according to the application in order to decrease the computational time
Fast Filtering of Search Results Sorted by Attribute
Modern search services often provide multiple options to rank the search results, e.g., sort "by relevance", "by price"or "by discount"in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this article, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, -Filtering, which, given an allowed approximation error , finds a (1-)-optimal filtering, i.e., the relevance of its solution is at least (1-) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while -Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that -Filtering can achieve quasi-optimal solutions while being faster than all state-of-the-art competitors in most of the tested configurations
Learning bivariate scoring functions for ranking
State-of-the-art Learning-to-Rank algorithms, e.g., λMART, rely on univariate scoring functions to score a list of items. Univariate scoring functions score each item independently, i.e., without considering the other available items in the list. Nevertheless, ranking deals with producing an effective ordering of the items and comparisons between items are helpful to achieve this task. Bivariate scoring functions allow the model to exploit dependencies between the items in the list as they work by scoring pairs of items. In this paper, we exploit item dependencies in a novel framework—we call it the Lambda Bivariate (LB) framework—that allows to learn effective bivariate scoring functions for ranking using gradient boosting trees. We discuss the three main ingredients of LB: (i) the invariance to permutations property, (ii) the function aggregating the scores of all pairs into the per-item scores, and (iii) the optimization process to learn bivariate scoring functions for ranking using any differentiable loss functions. We apply LB to the λRank loss and we show that it results in learning a bivariate version of λMART—we call it Bi-λMART—that significantly outperforms all neural-network-based and tree-based state-of-the-art algorithms for Learning-to-Rank. To show the generality of LB with respect to other loss functions, we also discuss its application to the Softmax loss
- …
