1,720,974 research outputs found
Distinguished Paper Award at OOPSLA 2011, the 26th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications.
Computing the Shapley Value in Allocation Problems: Approximations and Bounds, with an Application to the Italian VQR Research Assessment Program
In allocation problems, a given set of goods are assigned to agents in such a
way that the social welfare is maximised, that is, the largest possible global
worth is achieved. When goods are indivisible, it is possible to use money
compensation to perform a fair allocation taking into account the actual
contribution of all agents to the social welfare. Coalitional games provide a
formal mathematical framework to model such problems, in particular the Shapley
value is a solution concept widely used for assigning worths to agents in a
fair way. Unfortunately, computing this value is a -hard problem, so
that applying this good theoretical notion is often quite difficult in
real-world problems.
We describe useful properties that allow us to greatly simplify the instances
of allocation problems, without affecting the Shapley value of any player.
Moreover, we propose algorithms for computing lower bounds and upper bounds of
the Shapley value, which in some cases provide the exact result and that can be
combined with approximation algorithms.
The proposed techniques have been implemented and tested on a real-world
application of allocation problems, namely, the Italian research assessment
program, known as VQR. For the large university considered in the experiments,
the problem involves thousands of agents and goods (here, researchers and their
research products). The algorithms described in the paper are able to compute
the Shapley value for most of those agents, and to get a good approximation of
the Shapley value for all of them
Accuracy of Author Names in Bibliographic Data Sources: An Italian Case Study
We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages
Trading off space for passes in graph streaming problems
Data stream processing has recently received increasing attention as a computational paradigm for dealing with massive data sets. Surprisingly, no algorithm with both sublinear space and passes is known for natural graph problems in classical read-only streaming. Motivated by technological factors of modern storage systems, some authors have recently started to investigate the computational power of less restrictive models where writing streams is allowed. In this article, we show that the use of intermediate temporary streams is powerful enough to provide effective space-passes tradeoffs for natural graph problems. In particular, for any space restriction of s bits, we show that single-source shortest paths in directed graphs with small positive integer edge weights can be solved in O((n log3/2 n)/√s) passes. The result can be generalized to deal with multiple sources within the same bounds. This is the first known streaming algorithm for shortest paths in directed graphs. For undirected connectivity, we devise an O((n log n)/s) passes algorithm. Both problems require ω(n/s) passes under the restrictions we consider. We also show that the model where intermediate temporary streams are allowed can be strictly more powerful than classical streaming for some problems, while maintaining all of its hardness for others. © 2009 ACM
Trading off space for passes in graph streaming problems
Data stream processing has recently received increasing attention as a computational paradigm for dealing with massive data sets. While major progress has been achieved for several fundamental data sketching and statistics problems, there are many problems that seem to be hard in a streaming setting, including most classical graph problems. Relevant examples are graph connectivity and shortest paths, for which linear lower bounds on the "space x passes" product are known. Some recent papers have shown that several graph problems can be solved with one or few passes, if the working memory is large enough to contain all the vertices of the graph. A natural question is whether we can reduce the space usage at the price of increasing the number of passes. Surprisingly, no algorithm with both sublinear space and passes is known for natural graph problems in classical streaming models. Motivated by technological factors of modern storage systems, some authors have recently started to investigate the computational power of less restrictive streaming models. In a FOCS'04 paper, Aggarwal et al. have shown that the use of intermediate temporary streams, combined with the ability to reorder them at each pass for free, yields enough power to solve in polylogarithmic space and passes a variety of problems, including graph connectivity. They leave however as an open question whether problems such as shortest paths can be solved efficiently in this more powerful model. In this paper, we show that the "streaming with sorting" model by Aggarwal et al. can yield interesting results even without using sorting at all: by just using intermediate temporary streams, we provide the first effective spacepasses tradeoffs for natural graph problems. In particular, for any space restriction of s bits, we show that single-source shortest paths in directed graphs with small positive integer edge weights can be solved in O((n log(3/2) n)/root 7s) passes. This is the first known streaming algorithm for shortest paths in directed graphs. For undirected connectivity, we devise an O((n log n)/s) passes algorithm. Both problems require Omega(n/s) passes under the restrictions we consider. We also show that the model where intermediate temporary streams are allowed can be strictly more powerful than classical streaming for some problems, while maintaining all of its hardness for others
Reactive imperative programming with dataflow constraints
Dataflow languages provide natural support for specifying constraints between objects in dynamic applications, where programs need to react efficiently to changes of their environment. In this article we show that one-way dataflow constraints, largely explored in the context of interactive applications, can be seamlessly integrated in any imperative language and can be used as a general paradigm for writing performance-critical reactive applications that require efficient incremental computations. In our framework, programmers can define ordinary statements of the imperative host language that enforce constraints between objects stored in special memory locations designated as "reactive". Reactive objects can be of any legal type in the host language, including primitive data types, pointers, arrays, and structures. Statements defining constraints are automatically re-executed every time their input memory locations change, letting a program behave like a spreadsheet where the values of some variables depend upon the values of other variables. The constraint solving mechanism is handled transparently by altering the semantics of elementary operations of the host language for reading and modifying objects. We provide a formal semantics and describe a concrete embodiment of our technique into C/C++, showing how to implement it efficiently in conventional platforms using off-the-shelf compilers. We discuss common coding idioms and relevant applications to reactive scenarios, including incremental computation, observer design pattern, data structure repair, and software visualization. The performance of our implementation is compared to problem-specific change propagation algorithms, as well as to language-centric approaches such as self-adjusting computation and subject/observer communication mechanisms, showing that the proposed approach is efficient in practice
On bibliometrics in academic promotions: a case study in computer science and engineering in Italy
Due to its quantitative nature, bibliometrics is becoming increasingly popular among policy makers for academic hiring and career promotions. In this article, we quantitatively assess the impact that the granularity level in the classification of scientific areas would entail on research evaluation based on bibliometric indicators. We use as a case study the Italian national habilitation system (ASN), which classifies faculty members according to their academic discipline and relies on journal counts, citations, and h-indices as a basis for promoting tenure track researchers to associate professors and associate to full professors. The assessment checks whether the individual indicators of a researcher are above a certain threshold, e.g., the median over the population of researchers working in the same discipline.
Our investigation focuses on two related, rather broad disciplines: computer science and computer engineering. We show that the ASN practice of using the same thresholds for all members of a scientific discipline can favor certain sub-communities that are characterized by higher bibliometric indicators, and disfavor others. We report evidence that up to 30% of Italian faculty members of certain sub-communities would see their indicators drop below the threshold, thus becoming not eligible for promotion, if the ASN were conducted on a more accurate, fine-grained classification. Conversely, in the same scenario, up to 11% of faculty members, in different sub-communities, would see their indicators rise above the threshold, granting them eligibility. Our data set includes 1,685 authors, 89,185 distinct publications, and 262,286 author- publication pairs
Adapting parallel algorithms to the W-Stream model, with applications to graph problems
In this paper we show how parallel algorithms can be turned into efficient streaming algorithms for several classical combinatorial problems in the W-Stream model. In this model, at each pass one input stream is read, one output stream is written, and data items have to be processed using limited space; streams are pipelined in such a way that the output stream produced at pass i is given as input stream at pass i + 1. We first introduce a simulation technique that allows turning efficient PRAM algorithms into optimal W-Stream ones, for many classical combinatorial problems, including list ranking and Euler tour of a tree. For other problems, most notably graph problems, however, this technique leads to suboptimal algorithms. To overcome this difficulty we introduce the Relaxed PRAM (RPRAM) computational model, as an intermediate model between PRAM and W-Stream. RPRAM allows every processor to access a non-constant number of memory cells per parallel round, albeit with some restrictions. The RPRAM model, while being more powerful than the PRAM model, can be simulated in W-Stream within the same asymptotic bounds. The extra power provided by RPRAM allows us in many cases to substantially reduce the number of processors, while maintaining the same number of parallel rounds, leading to more efficient W-Stream simulations of parallel algorithms. Our RPRAM technique gives new insights on developing streaming algorithms and yields efficient algorithms for several classical problems in this model including sorting, connectivity, minimum spanning tree, biconnected components, and maximal independent set. In addition to allowing smooth space-passes tradeoffs, our algorithms are also shown, by proving almost-tight communication complexity-based lower bounds in W-Stream, to be optimal up to polylog factors. (C) 2010 Elsevier B.V. All rights reserved
Adapting parallel algorithms to the W-stream model, with applications to graph problems
In this paper we show how parallel algorithms can be turned into efficient streaming algorithms for several classical combinatorial problems in the W-Stream model. In this model, at each pass one input stream is read and one output stream is written; streams are pipelined in such a way that the output stream produced at pass i is given as input stream at pass i + 1. Our techniques give new insights on developing streaming algorithms and yield optimal algorithms (up to polylog factors) for several classical problems in this model including sorting, connectivity, minimum spanning tree, biconnected components, and maximal independent set. © Springer-Verlag Berlin Heidelberg 2007
Which Conference Is That? A Case Study in Computer Science
Conferences play a major role in some disciplines such as computer science and are often used in research quality evaluation exercises. Differently from journals and books, for which ISSN and ISBN codes provide unambiguous keys, recognizing the conference series in which a paper was published is a rather complex endeavor: there is no unique code assigned to conferences and the way their names are written may greatly vary across years and catalogs. In this article, we propose a technique for the entity resolution of conferences based on the analysis of different semantic parts of their names. We present the results of an investigation of our technique on a dataset of 42395 distinct computer science conference names excerpted from the DBLP computer science repository1, which we automatically link to different authority files. With suitable data cleaning, the precision of our record linkage algorithm can be as high as 94%. A comparison with results obtainable using state-of-the-art general-purpose record linkage algorithms rounds off the paper, showing that our ad hoc solution largely outperforms them in terms of the quality of the results
- …
