1,721,102 research outputs found
Recent advances in energy efficient query processing
Web search companies distribute their infrastructures and operations across several, geographically distant data centers. This distributed architecture facilitates high performance query processing, which is fundamental for the success of a Web search engine. At the same time, data centers require an huge amount of electricity to operate their computing resources. In this extended abstract, we briey discuss our recent works for improving the energy efficiency of query processing systems. Firstly, we introduce a novel query forwarding algorithm which exploits green energy sources to reduce the electricity expenditure and carbon footprint of Web search engines. Then, we propose to delegate the CPU power management from a server' operative system directly to the query processing application, to reduce the energy consumption of a search engine's servers. Finally, we introduce PESOS, a scheduling algorithm which manages the CPU power consumption on a per-query basis while considering query latency constraints
A study on query energy consumption in web search engines
Commercial web search engines are usually deployed on data centers, which leverage thousands of servers to eficiently answer queries on a large scale. Thanks to these distributed infrastructures, search engines can quickly serve high query volumes. However, the energy consumed by these many servers poses economical and environmental challenges for the Web search engine companies. To tackle such challenges, we advocate the importance of quantifying the energy consumption of a search engine. Therefore, in this study we experimentally analyze energy consumption on a per query basis. Our aim is to evaluate how much energy is consumed by a search server to answer a single query, i.e, its query energy consumption. To perform such measurements, experiments are conducted using the TREC ClueWeb09 collection and the MSN 2006 query log. Results suggest that solving queries require an amount of energy directly proportional to the query processing time
A Reproducibility Study of PLAID
The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring. In this paper, we reproduce and fill in missing gaps from the original work. By studying the parameters PLAID introduces, we find that its Pareto frontier is formed of a careful balance among its three parameters; deviations beyond the suggested settings can substantially increase latency without necessarily improving its effectiveness. We then compare PLAID with an important baseline missing from the paper: re-ranking a lexical system. We find that applying ColBERTv2 as a re-ranker atop an initial pool of BM25 results provides better efficiency-effectiveness trade-offs in low-latency settings. However, re-ranking cannot reach peak effectiveness at higher latency settings due to limitations in recall of lexical matching and provides a poor approximation of an exhaustive ColBERTv2 search. We find that recently proposed modifications to re-ranking that pull in the neighbors of top-scoring documents overcome this limitation, providing a Pareto frontier across all operational points for ColBERTv2 when evaluated using a well-annotated dataset. Curious about why re-ranking methods are highly competitive with PLAID, we analyze the token representation clusters PLAID uses for retrieval and find that most clusters are predominantly aligned with a single token and vice versa. Given the competitive trade-offs that re-ranking baselines exhibit, this work highlights the importance of carefully selecting pertinent baselines when evaluating the efficiency of retrieval engines. https://github.com/seanmacavaney/plaidrepr
Representing document lengths with identifiers
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically
How to run scientific applications over web services
Today, the task of running and coordinating a scientific application across several administrative domains is extremely complex. As an example, the most popular tool for scientific applications, MPI, is not designed to address firewall limitations or data heterogeneity, even if its extensions deal with some of these problems. In this paper, we design a new approach to run a scientific application in a distributed environment, when data and computing power are scattered across the Web: Web Services can be used to tunnel computation and data migration. We show that a very simple mapping exists between MPI primitives and the Web Service infrastructure. We are currently designing a framework, based on Web Services, which will implement the main MPI primitives: this way an MPI application could be run on any platform supporting Web Services. © 2005 IEEE
Faster Learned Sparse Retrieval with Block-Max Pruning
Learned sparse retrieval systems aim to combine the effectiveness of contextualized language models with the scalability of conventional data structures such as inverted indexes. Nevertheless, the indexes generated by these systems exhibit significant deviations from the ones that use traditional retrieval models, leading to a discrepancy in the performance of existing query optimizations that were specifically developed for traditional structures. These disparities arise from structural variations in query and document statistics, including sub-word tokenization, leading to longer queries, smaller vocabularies, and different score distributions within posting lists. This paper introduces Block-Max Pruning (BMP), an innovative dynamic pruning strategy tailored for indexes arising in learned sparse retrieval environments. BMP employs a block filtering mechanism to divide the document space into small, consecutive document ranges, which are then aggregated and sorted on the fly, and fully processed only as necessary, guided by a defined safe early termination criterion or based on approximate retrieval requirements. Through rigorous experimentation, we show that BMP substantially outperforms existing dynamic pruning strategies, offering unparalleled efficiency in safe retrieval contexts and improved trade-offs between precision and efficiency in approximate retrieval tasks
Drivers Stress Identification in Real-World Driving Tasks
In the past few years, cross-modal distillation has garnered a lot of interest due to the rapid growth of multi-modal data. In this paper, we study stress recognition of the drivers corresponding to the driving situation. Our method enables us to recognize stress from unlabeled videos. We perform cross-modal distillation based on wearable physiological sensors and videos from on-board cameras. In this cross-modal distillation, knowledge is transferred from sensor to vision modality
A tool to execute ASSIST applications on globus-based grids
This article describes ASSISTCONF, a graphical user interface designed to execute ASSIST applications on Globus-based Grids. ASSIST is a new programming environment for the development of parallel and distributed high-performance applications. ASSISTCONF hides to the programmer the structure of the grid used and integrates the ASSIST Run Time System with the Globus middleware. The first version of ASSISTCONF was designed to manually configure an ASSIST application and to establish a mapping between the application components and the machines selected for its execution on the Grid. The new ASSISTCONF functionalities, such as authentication and execution authorization on the resources selected in the application mapping phase, and deployment on the selected resources of the ASSIST Run Time Support, the executable application components, and the application input data, allow the semi-automatic execution of an ASSIST application on a such environment
Latency-Energy Tradeoffs in Federated Learning on Resource Constrained Edge Computing Systems
Artificial intelligence and machine learning have become of crucial importance in many scientific and industrial fields, thanks to the ability to extract information, make predictions and identify patterns on data. For the creation of increasingly accurate predictive models, these technologies are based on the collection and control of large amounts of data within controlled systems. Federated learning is a new framework that exploits the computational capabilities and local data of a set of multiple resource-constrained devices coordinated by a central server for the creation of a shared global predictive model, without any centralised data collection. In this work, we focus on assessing the performance of federated learning executed on resource constrained Edge computing system. A set of experiments to assess the energy consumption and processing times on a set of heterogeneous GPU-enabled embedded systems were executed. Our analysis shows that, by varying the amount of data that each system is in charge of processing, it is possible to identify a trade-off between the overall energy consumption of the devices and the processing time required to train an effective predictive model
- …
