Search CORE

1,721,150 research outputs found

CTTP

Author: Kang U
Pagh Rasmus
Park Ha Myung
SILVESTRI FRANCESCO
Publication venue
Publication date: 01/01/2014
Field of study

We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration. This generalizes the well-known vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the "curse of the last reducer". Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph

Archivio istituzionale della ricerca - Università di Padova

Confirmation sampling for exact nearest neighbor search

Author: Thorup Mikkel
Mikkel Thorup
Pagh Rasmus
Rasmus Pagh
Tobias Christiani
Christiani Tobias Lybecker
Christiani Tobias
Publication venue
Publication date: 01/01/2020
Field of study

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability 1−δ , using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of Ω(L) trees with internal parameters specifically tuned to the query and data

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

NTNU Open (Norwegian University of Science and Technology)

NORA - Norwegian Open Research Archives

A Generic Framework for Engineering Graph Canonization Algorithms

Author: Andersen Jakob L.; id_orcid
Jakob L. Andersen
Merkle Daniel
Daniel Merkle
Publication venue
Publication date: 01/01/2018
Field of study

The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their di erence is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas a ect the performance on di erent graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behaviour of di erent algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of di erent graph classes we investigate the e ect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of di cult instances, we additionally find that our implementation performs better than the current state-of-the-art tools.</p

Crossref

Syddansk Universitets Forskerportal

Fast and Space-Efficient Perfect Hashing

Author: Lehmann Hans-Peter
Publication venue
Publication date: 19/11/2024
Field of study

From data analytics to machine learning, large amounts of input data become more and more important. The volume of collected data grows continuously. Compact data structures enable efficient access to this data and help with processing. Perfect hashing is a fundamental basis for many compact data structures and is used for example in databases, hash tables, retrieval data structures, and approximate membership data structures. A perfect hash function (PHF) maps a set

S

n

keys to the first

m

integers without collisions, and is called minimal perfect (MPHF) if

m=n

. It may map keys that are not in

S

to an arbitrary value. This makes it possible to store the function without representing the set

S

itself. In this dissertation, we present three MPHF construction algorithms. With SIMDRecSplit, we focus on space efficiency, scratching the space lower bound of representing MPHFs. We enhance the existing construction RecSplit through parallelism on many levels - bits, vectors, multicores, and the GPU. As a base case, RecSplit uses brute-force search - here we achieve impressive speedups by replacing several retries with simple bit operations or even table lookups. With SicHash, we aim for a good balance between query and construction performance. SicHash is based on sampling a random graph where the nodes are the output hash values. Each key, hashed with different ordinary hash functions, gives a hyperedge connecting candidate output values. The perfect hash function is given by an orientation of the edges such that the indegree of each node is at most one. SicHash works close to the orientability threshold - the factor

n/m

below which such an orientation likely exists. With ShockHash, we combine the ideas of SIMDRecSplit and SicHash. We try to orient a graph far above the orientability threshold, which requires retrying with many different graphs. Compared to plain brute-force, this reduces the number of retries massively, leading to more than

2^n

times faster construction asymptotically. At the same time, it still maintains the asymptotically optimal space consumption. Monotone minimal perfect hash functions (MMPHFs) retain the natural order of the input keys. After many years in which tree-based structures were predominant, our LeMonHash moves into a novel direction. It is based on learning regularities in the input data and is efficient to query because it uses a flat structure. Finally, as a relaxation of perfect hashing, we present PaCHash - a static external memory hash table for objects of variable size. PaCHash can fetch an object using a single contiguous access to the external memory. For objects of fixed size, this is similar to what is possible using

k

-perfect hashing, where at most

k

collisions are allowed. However, in some way, PaCHash breaks the space lower bounds of

k

-perfect hashing at the cost of sometimes fetching one external memory block too much. The approaches presented in this dissertation are a large step forward from the state of the art. Our approaches dominate the majority of the space-time trade-off, as shown by our detailed experimental evaluation, and inspire future research in the area of perfect hashing

KITopen

Repository KITopen

Linear time canonicalization and enumeration of non-Isomorphic 1-Face embeddings

Author: Knudsen AS
Kotrbčík M
Hellmuth Marc
Knudsen Anders S.
Hellmuth M
Marc Hellmuth
Michal Kotrbčík
Merkle D
Kotrbík Michal
Merkle Daniel
Daniel Merkle
Nøjgaard N
Anders S. Knudsen
Nikolai Nøjgaard
Nøjgaard Nikolai
Publication venue
Publication date: 01/01/2018
Field of study

Antiparallel strong traces (ASTs) are a type of walks in graphs which use every edge exactly twice. They correspond to 1-face embeddings in orientable surfaces and can be used to design self-assembling protein or DNA strands. Based on a novel canonical form invariant for ASTs, gap vector, we provide a linear-time isomorphism test for ASTs and thus, also for orientable 1-face embeddings of graphs. Using the canonical form, we develop an algorithm for enumerating all pairwise non-isomorphic 1-face embeddings of graphs. We compare our algorithm with an independent implementation of a recent algebraic approach (Baši? et al., MATCH Commun. Math. Comput. Chem. 78 (3), 2017) on large data sets. Our results yield the first large-scale enumeration of non-isomorphic embeddings and investigation of their properties

Crossref

Syddansk Universitets Forskerportal

White Rose Research Online

UQ eSpace (University of Queensland)

Optimal prefix-suffix queries with applications

Author: Pissis Solon P.; id_orcid
Pissis S. (Solon)
Publication venue
Publication date: 01/01/2025
Field of study

We revisit the classic border tree data structure [Gu, Farach, Beigel, SODA 1994] that answers the following prefix-suffix queries on a string T of length n over an integer alphabet Σ = [0, σ): for any i, j ∈ [0, n) return all occurrences of T in T[0.. i]T[j.. n−1]. The border tree of T can be constructed in O(n) time and answers prefix-suffix queries in O(log n+Occ) time, where Occ is the number of occurrences of T in T[0.. i]T[j.. n−1]. Our contribution here is the following. We present a completely different and remarkably simple data structure that can be constructed in the optimal O(n/logσ n) time and supports queries in the optimal O(1) time. Our result is based on a new structural lemma that lets us encode the output of any query in constant time and space. We also show a new direct application of our result in pattern matching on node-labeled graphs.</p

VU Research Portal

CWI's Institutional Repository

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Über die Analyse von zwei fundamentalen randomisierten Algorithmen

Author: Aumüller Martin
Publication venue
Publication date: 29/07/2015
Field of study

Im ersten Teil der vorliegenden Arbeit werden Multi-Pivot-Quicksort-Algorithmenbetrachtet. Die Idee mehr als ein Pivotelement im Quicksort-Algorithmus zunutzen, erschien über viele Jahre als unpraktikabel. Dies änderte sich, als imJahr 2009 ein Dual-Pivot-Algorithmus von V. Yaroslavskiy zumStandard-Sortierverfahren in Java 7 wurde. Die vorliegende Arbeit stellt eineStudie von Multi-Pivot-Quicksort-Algorithmen dar, also Quicksort-Varianten, diemit mehr als einem Pivotelement arbeiten. Sie beschreibt dieKonstruktionsprinzipien von 2-Pivot-Algorithmen in Bezug auf die bei derSortierung notwendigen Schlüsselvergleiche. Ein Ergebnis dieser Untersuchungsind zwei optimale und leicht zu implementierende 2-Pivot-Algorithmen. DieVerallgemeinerung auf >= 3 Pivotelemente benötigt nur kleine Anpassungen. DieseArbeit betrachtet außerdem die theoretische Analyse von Kostenmaßen, die esermöglichen, Multi-Pivot-Quicksort-Algorithmen hinsichtlich ihres Speicher- undCacheverhaltens zu vergleichen. Sie schließt mit einer Laufzeitstudie derbesprochenen Algorithmen. Der zweite Teil der Arbeit beschäftigt sich mit dem Einsatz von Hashfunktionenin Algorithmen und Datenstrukturen. Hashfunktionen bilden eine Kernkomponente,z.B. beim Aufbau einer Hashtabelle oder bei Lastbalancierung. Oft wird dabeieine unrealistische Annahme getätigt: Die Hashwerte seien voll zufällig. DieSpeicherplatzkomplexität einer solchen Funktion ist für den praktischen Einsatzfür unverhältnismäßig hoch. Das Ziel ist, einfache Konstruktionen zu finden,deren Zufallseigenschaften beweisbar gut sind. Diese Arbeit beschreibt einesolche einfache Konstruktion von Hashfunktionen, die in einer Vielzahl vonAnwendungen beweisbar gut ist. Zu diesen Anwendungen zählen Cuckoo Hashing miteinem sogenannten Stash, die Konstruktion einer perfekten Hashfunktion, dieSimulation einer uniformen Hashfunktion, verschiedene Algorithmen zurLastbalancierung und verallgemeinertes Cuckoo Hashing in einer leichtabgeschwächten Variante mit verschiedenen Einfügealgorithmen. Der zentraleBeitrag dieser Dissertation ist ein einheitliches Analysekonzept. Diesesermöglicht es, eine auf Hashfunktionen basierende Datenstruktur oder einen aufHashfunktionen basierenden Algorithmus nur mit Mitteln der Theorie vonZufallsgraphen zu analysieren, ohne Details der Hashfunktion offenzulegen.The first part of this thesis considers multi-pivot quicksort algorithms. Theidea of using more than one pivot element in quicksort was considered to beimpractical. This changed when a dual-pivot algorithm replaced thewell-engineered quicksort-based sorting algorithm in Java 7. The thesispresents a detailed study of multi-pivot quicksort algorithms. It explains theunderlying design choices for dual-pivot quicksort algorithms with respect tothe comparisons they make when sorting the input. Moreover, it describes twoeasy to implement optimal dual-pivot algorithms To analyze quicksort algorithmsusing more than two pivots, only slight modifications to the dual-pivot caseare necessary. This thesis also considers the theoretical analysis of thememory and cache behavior of multi-pivot quicksort algorithms. At the end, itreports on a large-scale study on the empirical running time of multi-pivotquicksort algorithms. The second part of this thesis considers the use of hash functions inalgorithms and data structures. Hash functions are applied in many differentsituations, e.g., when building a hash table or when distributing jobs amongmachines in load balancing. Traditionally, the analysis of a particularhashing-based algorithm or data structure assumes that a hash function mapskeys independently and uniformly at random to a range. Such functions areunrealistic, since their space complexity is huge. Consequently, the task is toconstruct explicit hash functions providing provable theoretical guarantees.The thesis describes such a construction. It will provide sufficient randomnessfor running many different applications, such as cuckoo hashing with a stash,the construction of a perfect hash function, the simulation of a uniform hashfunction, load balancing, and generalized cuckoo hashing in a sparse settingwith two alternative insertion algorithms. The main contribution of this partof the thesis is a unified framework based on the first moment method. Thisframework makes it possible to analyze a hashing-based algorithm or datastructure only using random graph theory, without exploiting details of thehash function. The hash functions are easy to implement and turn out to bepractical while providing strong randomness guarantees

Digitale Bibliothek Thüringen

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository