1,721,730 research outputs found

    LSH-Preserving functions and their applications

    No full text
    Locality sensitive hashing (LSH) is a key algorithmic tool that is widely used both in theory and practice. An important goal in the study of LSH is to understand which similarity functions admit an LSH, that is, are LSHable. In this article, we focus on the class of transformations such that given any similarity that is LSHable, the transformed similarity will continue to be LSHable. We show a tight characterization of all such LSH-preserving transformations: they are precisely the probability generating functions, up to scaling. As a concrete application of this result, we study which set similarity measures are LSHable. We obtain a complete characterization of similarity measures between two sets A and B that are ratios of two linear functions of |A B|, |A δB|, |AB|: such a measure is LSHable if and only if its corresponding distance is a metric. This result generalizes the well-known LSH for the Jaccard set similarity, namely, the minwiseindependent permutations, and obtains LSHs for many set similarity measures that are used in practice. Using our main result, we obtain a similar characterization for set similarities involving radicals

    On the power laws of language: word frequency distributions

    Full text link
    About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a loglog scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. Theword frequency distributions produced by this model are shown to match the observations both analytically and empirically. © 2017 Copyright held by the owner/author(s)

    On the number of trials needed to distinguish similar alternatives

    No full text
    A/B testing is widely used to tune search and recommendation algorithms, to compare product variants as efficiently and effectively as possible, and even to study animal behavior. With ongoing investment, due to diminishing returns, the items produced by the new alternative B show smaller and smaller improvement in quality from the items produced by the current system A. By formalizing this observation, we develop closed-form analytical expressions for the sample efficiency of a number of widely used families of slate-based comparison tests. In empirical trials, these theoretical sample complexity results are shown to be predictive of real-world testing efficiency outcomes. These findings offer opportunities for both more cost-effective testing and a better analytical understanding of the problem

    Teoría de las relaciones internacionales estadocéntricas y la paradoja de la seguridad para el sur global (Asia y América Latina). Valoración crítica por los realistas subalterno-periférico Mohammed Ayoob y Carlos Escudé

    No full text
    Fil: Kumar, Ravi. Mahatma Gandhi International Hindi University. Center for Foreign Languages and International Studies; India.Fil: Nawaz, Rafida. Bahauddin Zakariya University; Pakistán.Este artículo apunta a revisitar las presunciones a priori del paradigma estadocéntrico, alternativamente resigni fi cadas como realpolitik , realismo o el sistema de equilibrio de poder. Aunque este abordaje provee las normas del comportamiento de los estados desde la creación del sistema estatal moderno, está sujeto a un criticismo severo por parte de sus teóricos clásicos, como E. H. Carr, Hans J. Morgenthau, entre otros, así como por parte de las relaciones internacionales críticas que se desarrollan en el norte global, que ven el dilema de seguridad como amenaza a la paz internacional. No obstante, las críticas de estos críticos del Norte son incapaces de proveer una alternativa a las políticas de poder

    Fair Clustering Through Fairlets

    Full text link
    We study the question of fair clustering under the disparate impact doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions - for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the price of fairness by quantifying the value of fair clustering on real-world datasets with sensitive attributes

    Approximating a RUM from distributions on k-slates

    No full text
    In this work we consider the problem of fitting Random Utility Models (RUMs) to user choices. Given the winner distributions of the subsets of size kk of a universe, we obtain a polynomial-time algorithm that finds the RUM that best approximates the given distribution on average. Our algorithm is based on a linear program that we solve using the ellipsoid method. Given that its separation oracle problem is NP-hard, we devise an approximate separation oracle that can be viewed as a generalization of the weighted Feedback Arc Set problem to hypergraphs. Our theoretical result can also be made practical: we obtain a heuristic that scales to real-world datasets

    Discrete choice, permutations, and reconstruction

    Full text link
    In this paper we study the well-known family of Random Utility Models, developed over 50 years ago to codify rational user behavior in choosing one item from a finite set of options. In this setting each user draws i.i.d. from some distribution a utility function mapping each item in the universe to a real-valued utility. The user is then offered a subset of the items, and selects theone of maximum utility. A Max-Dist oracle for this choice model takes any subset of items and returns the probability (over the distribution of utility functions) that each will be selected. A discrete choice algorithm, given access to a Max-Dist oracle, must return a function that approximates the oracle. We show three primary results. First, we show that any algorithm exactly reproducing the oracle must make exponentially many queries. Second, we show an equivalent representation of the distribution over utility functions, based on permutations, and show that if this distribution has support size k, then it is possible to approximate the oracle using O(nk) queries. Finally, we consider settings in which the subset of items is always small. We give an algorithm that makes less than n(1=2)K queries, each to sets of size at most (1/2)K, in order to approximate the Max-Dist oracle on every set of size |T| K with statistical error at most. In contrast, we show that any algorithm that queries for subsets of size 2O( p log n) must make maximal statistical error on some large sets

    First record and DNA Barcoding of Oman cownose ray, Rhinoptera jayakari Boulenger, 1895 from Andaman Sea, India

    No full text
    Pradeep, Hosahalli Divakar, Swapnil, Shivdas Shirke, Nashad, Musaliyarakam, Venu, Sasidharan, Ranjan, Kumar Ravi, Sumitha, Gopalakrishnan, Devi, Sukham Monalisha, Farejiya, Mahesh Kumar (2018): First record and DNA Barcoding of Oman cownose ray, Rhinoptera jayakari Boulenger, 1895 from Andaman Sea, India. Zoosystema 40 (4): 67-74, DOI: 10.5252/zoosystema2018v40a

    FIG. 1 in First record and DNA Barcoding of Oman cownose ray, Rhinoptera jayakari Boulenger, 1895 from Andaman Sea, India

    No full text
    FIG. 1. — Junglighat Fishing Harbour (Sampling station), South Andaman.Published as part of Pradeep, Hosahalli Divakar, Swapnil, Shivdas Shirke, Nashad, Musaliyarakam, Venu, Sasidharan, Ranjan, Kumar Ravi, Sumitha, Gopalakrishnan, Devi, Sukham Monalisha & Farejiya, Mahesh Kumar, 2018, First record and DNA Barcoding of Oman cownose ray, Rhinoptera jayakari Boulenger, 1895 from Andaman Sea, India, pp. 67-74 in Zoosystema 40 (4) on page 69, DOI: 10.5252/zoosystema2018v40a4, http://zenodo.org/record/376414
    corecore