1,721,020 research outputs found
Instance-optimal information-based voting
The classical Condorcet Jury Theorem considers a voting scenario in which there exists a candidate whose election would be ideal for each voter; each voter, though, has only a limited understanding of the world and is thus unable to determine exactly who this candidate is. The main question in this scenario is whether the voters, acting individually, can cast their ballots so that the unknown optimal candidate wins the election, and the welfare of the group of voters is maximized.
In this setting, each candidate is represented by a known probability distribution over signals about the world that the voters can perceive, that is, over bits. One of these candidates is chosen (secretively, by an adversary) to be the ideal candidate. Afterwards, each voter samples this unknown candidate's distribution once and casts a ballot with the hope that the unknown ideal candidate wins the election.
In this paper, we consider the famous Condorcet voting system, as well as some of its variants. First, we give a positive answer to an open question of Chierichetti and Kleinberg [8], and show that, with Condorcet voting, there exists a uniform voting strategy that makes the group of voters succeed with probability
provided that
voters take part in the election — here,
is the minimum total variation distance between the distributions of two candidates.
We also give a uniform voting strategy for the Copeland voting system (a variant of Condorcet) that makes the group succeed with probability
with
voters, where
is the minimum Hellinger distance between the distributions. Our uniform Copeland strategy, then, is an instance-optimal hypothesis testing algorithm: constants aside, the strategy is as efficient as the optimal omniscient algorithm which determines the unknown candidate after having directly observed each of the signals perceived by the voters. Then, we “derandomize” our uniform Copeland strategy, and obtain a Condorcet strategy that achieves instance-optimality at the cost of losing uniformity; finally, we prove that this loss of uniformity is necessary: no uniform Condorcet strategy can achieve instance-optimality, in general.
Thus, the right voting strategies let these classical combinatorial voting systems attain the same efficiency of centralized, optimal, hypothesis testers
The local nature of list colorings for graphs of high girth
We consider list coloring problems for graphs G of girth larger than c logΔ-1 n, where n and Δ ≥ 3 are, respectively, the order and the maximum degree of G, and c is a suitable constant. First, we determine that the edge and total list chromatic numbers of these graphs are X′l(G) = Δ and X′′l(G) = Δ + 1. This proves that the general conjectures of Bollobás and Harris [Graphs Combin., 1 (1985), pp. 115-127], Behzad [The total chromatic number, in Combinatorial Mathematics and Its Applications (Proc. Conf., Oxford, 1969), Academic Press, London, 1971, pp. 1-8], Vizing [Diskret. Analiz., 3 (1964), pp. 25-30], and Juvan, Mohar, and Škrekovski [Combin. Probab. Comput.,7 (1998), pp. 181-188] hold for this particular class of graphs. Moreover, our proofs exhibit a certain degree of "locality," which we exploit to obtain an efficient distributed algorithm able to compute both kinds of optimal list colorings. Also, using an argument similar to one of Erdös, we show that our algorithm can compute k-list vertex colorings of graphs having girth larger than c logk-1 n. © 2010 Society for Industrial and Applied Mathematics
LSH-Preserving functions and their applications
Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, that is, are LSHable. In this article, we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling. As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A B|, |A δB|, |AB|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwiseindependent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals
Voting with Limited Information and Many Alternatives
The traditional axiomatic approach to voting is motivated by the problem of reconciling differences in subjective preferences. In contrast, a dominant line of work in the theory of voting over the past 15 years has considered a different kind of scenario, also fundamental to voting, in which there is a genuinely "best" outcome that voters would agree on if they only had enough information. This type of scenario has its roots in the classical Condorcet Jury Theorem; it includes cases such as jurors in a criminal trial who all want to reach the correct verdict but disagree in their inferences from the available evidence, or a corporate board of directors who all want to improve the company's revenue, but who have different information that favors different options. This style of voting leads to a natural set of questions: each voter has a private signal that provides probabilistic information about which option is best, and a central question is whether a simple plurality voting system, which tabulates votes for different options, can cause the group decision to arrive at the correct option. We show that plurality voting is powerful enough to achieve this: there is a way for voters to map their signals into votes for options in such a way that - with sufficiently many voters -the correct option receives the greatest number of votes with high probability. We show further, however, that any process for achieving this is inherently expensive in the number of voters it requires: succeeding in identifying the correct option with probability at least 1 - η requires Ω (n3ε -2 log η-1) voters, where n is the number of options and e is a distributional measure of the minimum difference between the options. Copyright © SIAM
Stochastic models for tabbed browsing
We present a model of tabbed browsing that represents a hybrid between a Markov process capturing the graph of hyperlinks, and a branching process capturing the birth and death of tabs. We present a mathematical criterion to characterize whether the process has a steady state independent of initial conditions, and we show how to characterize the limiting behavior in both cases. We perform a series of experiments to compare our tabbed browsing model with pagerank, and show that tabbed browsing is able to explain 15-25% of the deviation between actual measured browsing behavior and the behavior predicted by the simple pagerank model. We find this to be a surprising result, as the tabbed browsing model does not make use of any notion of site popularity, but simply captures deviations in user likelihood to open and close tabs from a particular node in the graph. © 2010 International World Wide Web Conference Committee (IW3C2)
Max-cover in map-reduce
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets. We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm. © 2010 International World Wide Web Conference Committee (IW3C2)
Optimal probabilistic cache stampede prevention
When a frequently-accessed cache item expires, multiple requests to that item can trigger a cache miss and start regenerating that same item at the same time. This phenomenon,
known as cache stampede, severely limits the performance
of databases and web servers. A natural countermeasure to
this issue is to let the processes that perform such requests
to randomly ask for a regeneration before the expiration
time of the item. In this paper we give optimal algorithms
for performing such probabilistic early expirations. Our algorithms are theoretically optimal and have much better
performances than other solutions used in real-world applications
On the number of trials needed to distinguish similar alternatives
A/B testing is widely used to tune search and recommendation algorithms, to compare product variants as efficiently and effectively as possible, and even to study animal behavior. With ongoing investment, due to diminishing returns, the items produced by the new alternative B show smaller and smaller improvement in quality from the items produced by the current system A. By formalizing this observation, we develop closed-form analytical expressions for the sample efficiency of a number of widely used families of slate-based comparison tests. In empirical trials, these theoretical sample complexity results are shown to be predictive of real-world testing efficiency outcomes. These findings offer opportunities for both more cost-effective testing and a better analytical understanding of the problem
- …
