Search CORE

1,720,997 research outputs found

Compressed String Dictionary Search with Edit Distance One

Author: VENTURINI ROSSANO
Belazzougui Djamal
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern (Formula presented.) , the index has to report all the strings in the dictionary having edit distance at most one with (Formula presented.). Our first solution is able to solve queries in (almost optimal) (Formula presented.) time where (Formula presented.) is the number of strings in the dictionary having edit distance at most one with (Formula presented.). The space complexity of this solution is bounded in terms of the (Formula presented.) th order entropy of the indexed dictionary. A second solution further improves this space complexity at the cost of increasing the query time. Finally, we propose randomized solutions (Monte Carlo and Las Vegas) which achieve simultaneously the time complexity of the first solution and the space complexity of the second one

Crossref

Archivio della Ricerca - Università di Pisa

Compressed String Dictionary Look-up with Edit Distance One

Author: Belazzougui Djamal
VENTURINI ROSSANO
Rossano Venturini
Djamal Belazzougui
Publication venue
Publication date: 01/01/2012
Field of study

Crossref

Archivio della Ricerca - Università di Pisa

Compressed static functions with applications

Author: Belazzougui Djamal
VENTURINI ROSSANO
Rossano Venturini
Djamal Belazzougui
Publication venue
Publication date: 01/01/2013
Field of study

Crossref

Archivio della Ricerca - Università di Pisa

Relative FM-Indexes

Author: Belazzougui Djamal
Gagie Travis
Sirén Jouni
Gog Simon
Manzini Giovanni
Publication venue
Publication date: 01/01/2014
Field of study

Intuitively, if two strings S-1 and S-2 are sufficiently similar and we already have an FM-index for S-1 then, by storing a little extra information, we should be able to reuse parts of that index in an FM-index for S-2. We formalize this intuition and show that it can lead to significant space savings in practice, as well as to some interesting theoretical problems

Archivio della Ricerca - Università di Pisa

Composite repetition-aware data structures

Author: Cunial Fabio
PREZZA Nicola
Mathieu Raffinot
Belazzougui Djamal
Nicola Prezza
Fabio Cunial
Gagie Travis
Raffinot Mathieu
Travis Gagie
Djamal Belazzougui
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Relative FM-Indexes

Author: Gog Simon
Sirén Jouni
Jouni Sirén
Belazzougui Djamal
MANZINI Giovanni
Gagie Travis
Travis Gagie
Giovanni Manzini
Djamal Belazzougui
Simon Gog
Publication venue
Publication date: 01/01/2014
Field of study

Crossref

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Cache-oblivious peeling of random hypergraphs

Author: Boldi Paolo
D. Belazzougui
R. Venturini
S. Vigna
Vigna Sebastiano
P. Boldi
VENTURINI ROSSANO
Rossano Venturini
Sebastiano Vigna
Belazzougui Djamal
OTTAVIANO GIUSEPPE
Djamal Belazzougui
Giuseppe Ottaviano
G. Ottaviano
Paolo Boldi
Publication venue
Publication date: 01/01/2014
Field of study

The computation of a peeling order in a randomly generated hypergraph is the most time- consuming step in a number of constructions, such as perfect hashing schemes, random r-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires O(sort(n)) I/Os and O(n log n) time to peel a random hypergraph with n edges. We experimentally evaluate the performance of our implementation of this algorithm in a real- world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of 7.6 billion keys in less than 21 hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets

Crossref

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa

Making a Network Orchard by Adding Leaves

Author: Julien E.A.T. (author)
Jones Mark
Murakami Yukihiro
Julien Esther
van Iersel L.J.J. (author)
van Iersel Leo
Jones M.E.L. (author)
Murakami Yukihiro (author)
Publication venue
Publication date: 01/01/2023
Field of study

Phylogenetic networks are used to represent the evolutionary history of species. Recently, the new class of orchard networks was introduced, which were later shown to be interpretable as trees with additional horizontal arcs. This makes the network class ideal for capturing evolutionary histories that involve horizontal gene transfers. Here, we study the minimum number of additional leaves needed to make a network orchard. We demonstrate that computing this proximity measure for a given network is NP-hard and describe a tight upper bound. We also give an equivalent measure based on vertex labellings to construct a mixed integer linear programming formulation. Our experimental results, which include both real-world and synthetic data, illustrate the efficiency of our implementation

TU Delft Repository

DROPS Dagstuhl Research Online Publication Server

Efficient Tree-Structured Categorical Retrieval

Author: Belazzougui Djamal
Kucherov Gregory
Publication venue
Publication date: 01/01/2020
Field of study

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(logσ(1+o(1))+log D+O(h)) + O(Δ) bits of space and O(|p|+t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and Δ is the total number of nodes in the category tree. Another solution uses n(logσ(1+o(1))+O(log D))+O(Δ)+O(Dlog n) bits of space and O(|p|+tlog D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time

DROPS Dagstuhl Research Online Publication Server

HAL Portal de Univ. Gustave Eiffel

Hal-Diderot

HAL

HAL-Ecole des Ponts ParisTech

Efficient tree-structured categorical retrieval

Author: Belazzougui Djamal
Kucherov Gregory
Publication venue
Publication date: 09/12/2020
Field of study

Full version of a paper accepted for presentation at the 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)We study a document retrieval problem in the new framework where

D

text documents are organized in a {\em category tree} with a pre-defined number

h

of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern

p

and a category (level in the category tree), we wish to efficiently retrieve the

t

\emph{categorical units} containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses

n(\log\sigma(1+o(1))+\log D+O(h)) + O(\Delta)

bits of space and

O(|p|+t)

query time, where

n

is the total length of the documents,

\sigma

the size of the alphabet used in the documents and

\Delta

is the total number of nodes in the category tree. Another solution uses

n(\log\sigma(1+o(1))+O(\log D))+O(\Delta)+O(D\log n)

bits of space and

O(|p|+t\log D)

query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time

HAL Portal de Univ. Gustave Eiffel