1,721,221 research outputs found
Improving Exposure Allocation in Rankings by Query Generation
Deploying methods that incorporate generated queries in their retrieval process, such as Doc2Query, has been shown to be effective for retrieving the most relevant documents for a user’s query. However, to the best of our knowledge, there has been no work yet on whether generated queries can also be used in the ranking process to achieve other objectives, such as ensuring a fair distribution of exposure in the ranking. Indeed, the amount of exposure that a document is likely to receive depends on the document’s position in the ranking, with lower-ranked documents having a lower probability of being examined by the user. While the utility to users remains the main objective of an Information Retrieval (IR) system, an unfair exposure allocation can lead to lost opportunities and unfair economic impacts for particular societal groups. Therefore, in this work, we conduct a first investigation into whether generating relevant queries can help to fairly distribute the exposure over groups of documents in a ranking. In our work, we build on the effective Doc2Query methods to selectively generate relevant queries for underrepresented groups of documents and use their predicted relevance to the original query in order to re-rank the underexposed documents. Our experiments on the TREC 2022 Fair Ranking Track collection show that using generated queries consistently leads to a fairer allocation of exposure compared to a standard ranking while still maintaining utility
Query Exposure Prediction for Groups of Documents in Rankings
The main objective of an Information Retrieval (IR) system is to provide a user with the most relevant documents to the user’s query. To do this, modern IR systems typically deploy a re-ranking pipeline in which a set of documents is retrieved by a lightweight first-stage retrieval process and then re-ranked by a more effective but expensive model. However, the success of a re-ranking pipeline is heavily dependent on the performance of the first stage retrieval, since new documents are not usually identified during the re-ranking stage. Moreover, this can impact the amount of exposure that a particular group of documents, such as documents from a particular demographic group, can receive in the final ranking. For example, the fair allocation of exposure becomes more challenging or impossible if the first stage retrieval returns too few documents from certain groups, since the number of group documents in the ranking affects the exposure more than the documents’ positions. With this in mind, it is beneficial to predict the amount of exposure that a group of documents is likely to receive in the results of the first stage retrieval process, in order to ensure that there are a sufficient number of documents included from each of the groups. In this paper, we introduce the novel task of query exposure prediction (QEP). Specifically, we propose the first approach for predicting the distribution of exposure that groups of documents will receive for a given query. Our new approach, called GEP, uses lexical information from individual groups of documents to estimate the exposure the groups will receive in a ranking. Our experiments on the TREC 2021 and 2022 Fair Ranking Track test collections show that our proposed GEP approach results in exposure predictions that are up to ∼40% more accurate than the predictions of suitably adapted existing query performance prediction (QPP) and resource allocation approaches
DVM-CAR: A Large-Scale Automotive Dataset for Visual Marketing Research and Applications
There is a growing interest in product aesthetics analytics and design. However, the lack of available large-scale data that covers various variables and information is one of the biggest challenges faced by analysts and researchers. In this paper, we present our multidisciplinary initiative of developing a comprehensive automotive dataset from different online sources and formats. Specifically, the created dataset contains 1.4 million images from 899 car models and their corresponding model specifications and sales information over more than ten years in the UK market. Our work makes significant contributions to: (i) research and applications in the automotive industry; (ii) big data creation and sharing; (iii) database design; and (iv) data fusion. Apart from our motivation, technical details and data structure, we further present three simple examples to demonstrate how our data can be used in business research and applications
Efficient & Effective Selective Query Rewriting with Efficiency Predictions
To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine
Formulating XML-IR Queries
XML information retrieval systems differ from traditional information retrieval systems\ud
by returning relevant portions of documents, rather than entire documents.\ud
Theoretically, this should better fulfil the information needs of users, especially in\ud
situations where their information need is very complex. However, if users are going\ud
to exploit this advantage then they need a query formation interface that is both\ud
sophisticated and intuitive. This paper outlines four potential query formation\ud
interfaces: keywords, formal language, natural language and query by templates. For\ud
each interface it: outline the advantages and disadvantages, presents comparative\ud
results stemming from experiments and proposes several future research areas\ud
involving the four interfaces
Knowledge Graph Cross-View Contrastive Learning for Recommendation
Knowledge Graphs (KGs) are useful side information that help recommendation systems improve recommendation quality by providing rich semantic information about entities and items. Recently, models based on graph neural networks (GNNs) have adopted knowledge graphs to capture further high-order structural information, such as shared preferences between users and similarities between items. However, existing GNN-based methods suffer from two challenges: (1) Sparse supervisory signal, where a large amount of information in the knowledge graph is non-relevant to recommendation, and the training labels are insufficient, thereby limiting the recommendation performance of the trained model; (2) Valuable information is discarded whereby the use by the existing models of edge or node dropout strategies to obtain augmented views during self-supervised learning could lead to valuable information being discarded in recommendation. These two challenges limit the effective representation of users and items by existing methods. Inspired by self-supervised learning to mine supervision signals from data, in this paper, we focus on exploring contrastive learning based on knowledge graph enhancement, and propose a new model named Knowledge Graph Cross-view Contrastive Learning for Recommendation (KGCCL) to address the two challenges. Specifically, to address supervision sparseness, we perform contrastive learning between graph views at different levels and mine graph feature information in a self-supervised learning manner. In addition, we use noise augmentation to enhance the representation of users and items, while retaining all triplet information in the knowledge graph to address the challenge of valuable information being discarded. Experimental results on three public datasets show that our proposed KGCCL model outperforms existing state-of-the-art methods. In particular, our model outperforms the best baseline performance by 10.65% on the MIND dataset
GEO: A computational design framework for automotive exterior facelift
Exterior facelift has become an effective method for automakers to boost the consumers’ interest in an existing car model before it is redesigned. To support the automotive facelift design process, this study develops a novel computational framework – Generator, Evaluator, Optimiser (GEO), which comprises 3 components: a StyleGAN2-based design generator that creates different facelift designs; a convolutional neural network (CNN)-based evaluator that assesses designs from the aesthetics perspective; and a recurrent neural network (RNN)-based decision optimiser that selects designs to maximise the predicted profit of the targeted car model over time. We validate the GEO framework in experiments with real-world datasets and describe some resulting managerial implications for automotive facelift. Our study makes both methodological and application contributions. First, the generator’s mapping network and projection methods are carefully tailored to facelift where only minor changes are performed without affecting the family signature of the automobile brands. Second, two evaluation metrics are proposed to assess the generated designs. Third, profit maximisation is taken into account in the design selection. From a high-level perspective, our study contributes to the recent use of machine learning and data mining in marketing and design studies. To the best of our knowledge, this is the first study that uses deep generative models for automotive regional design upgrading and that provides an end-to-end decision-support solution for automakers and designers
Hybrid query scheduling for a replicated search engine
Search engines use replication and distribution of large indices across many query servers to achieve efficient retrieval. Under high query load, queries can be scheduled to replicas that are expected to be idle soonest, facilitated by the use of predicted query response times. However, the overhead of making response time predictions can hinder the usefulness of query scheduling under low query load. In this paper, we propose a hybrid scheduling approach that combines the scheduling methods appropriate for both low and high load conditions, and can adapt in response to changing conditions. We deploy a simulation framework, which is prepared with actual and predicted response times for real Web search queries for one full day. Our experiments using different numbers of shards and replicas of the 50 million document ClueWeb09 corpus show that hybrid scheduling can reduce the average waiting times of one day of queries by 68% under high load conditions and by 7% under low load conditions w.r.t. traditional scheduling methods
Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; ArgentinaFil: Santos, Rodrygo L. T.. University Of Glasgow; Reino UnidoFil: Macdonald, Craig. University Of Glasgow; Reino UnidoFil: Ounis, Iadh. University Of Glasgow; Reino Unid
- …
