1,721,086 research outputs found

    The Input/Output Complexity of Triangle Enumeration

    Full text link
    We consider the well-known problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let EE be the number of edges, M0M 0. Our results are based on a new color coding technique, which may be of independent interest

    Near-optimal deterministic single-source distance sensitivity oracles

    Full text link
    Given a graph with a distinguished source vertex s, the Single Source Replacement Paths (SSRP) problem is to compute and output, for any target vertex t and edge e, the length d(s, t, e) of a shortest path from s to t that avoids a failing edge e. A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a compact data structure that answers queries of the form (t, e) by returning the distance d(s, t, e). We show how to deterministically compress the output of the SSRP problem on n-vertex, m-edge graphs with integer edge weights in the range [1, M] into a Single-Source DSO that has size O(M1/2n3/2) and query time Oe(1). We prove that the space requirement is optimal (up to the word size). Our techniques can also handle vertex failures within the same bounds. Chechik and Cohen [SODA 2019] presented a combinatorial, randomized Oe(m√n + n2) time SSRP algorithm for undirected and unweighted graphs. We derandomize their algorithm with the same asymptotic running time and apply our compression to obtain a deterministic Single-Source DSO with Oe(m√n +n2) preprocessing time, O(n3/2) space, and Oe(1) query time. Our combinatorial Single-Source DSO has near-optimal space, preprocessing and query time for unweighted graphs, improving the preprocessing time by a √n -factor compared to previous results with o(n2) space. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic, randomized Oe(Mnω) time SSRP algorithm for (undirected and directed) graphs with integer edge weights in the range [1, M], where ω < 2.373 is the matrix multiplication exponent. We derandomize it for undirected graphs and apply our compression to obtain an algebraic Single-Source DSO with Oe(Mnω) preprocessing time, O(M1/2 n3/2) space, and Oe(1) query time. This improves the preprocessing time of algebraic Single-Source DSOs by polynomial factors compared to previous o(n2)-space oracles. We also present further improvements of our Single-Source DSOs. We show that the query time can be reduced to a constant at the cost of increasing the size of the oracle to O(M1/3 n5/3) and that all our oracles can be made path-reporting. On sparse graphs with m = O(nM5/74/−4ε) edges, for any constant ε > 0, we reduce the preprocessing to randomized Oe(M7/8 m1/2 n11/8) = O(n2−ε/2) time. To the best of our knowledge, this is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs

    MapReduce Triangle Enumeration With Guarantees

    No full text
    We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires \BO{E^{3/2}/(M\sqrt m)} rounds, where mm denotes the expected memory size of a reducer and MM the total available space. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the ``curse of the last reducer''. Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph. Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2×2\times, and that it can significantly increase the size of datasets that can be processed

    A Measurement-driven Analysis of Information Propagation in the Flickr Social Network

    Full text link
    We thank Augustin Chaintreau, Anja Feldmann, Duncan Watts, Divesh Srivastava, Rasmus Pagh, Mikkel Thorup, Juan Antonio Navarro Pérez, Bryan Ford, Nuno Santos, Bimal Viswanath, Rose Hoberman, and anonymous reviewers, for their valuable comments

    LIPIcs, Volume 53, SWAT'16, Complete Volume

    No full text
    LIPIcs, Volume 53, SWAT'16, Complete Volum

    Front Matter, Table of Contents, Preface, Program Committee, Subreviewers

    No full text
    Front Matter, Table of Contents, Preface, Program Committee, Subreviewer

    Similarity Search and Applications - 13th International Conference, SISAP 2020, Copenhagen, Denmark, September 30 - October 2, 2020, Proceedings.

    No full text
    This book constitutes the refereed proceedings of the 13th International Conference on Similarity Search and Applications, SISAP 2020, held in Copenhagen, Denmark, in September/October 2020. The conference was held virtually due to the COVID-19 pandemic. The 19 full papers presented together with 12 short and 2 doctoral symposium papers were carefully reviewed and selected from 50 submissions. The papers are organized in topical sections named: scalable similarity search; similarity measures, search, and indexing; high-dimensional data and intrinsic dimensionality; clustering; artificial intelligence and similarity; demo and position papers; and doctoral symposium

    Large-Scale Similarity Joins With Guarantees (Invited Talk)

    No full text
    The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access
    corecore