Search CORE

1,721,075 research outputs found

Compressed data structures for strings: on searching and extracting strings from compressed textual data

Author: Venturini Rossano
Publication venue
Publication date: 01/01/2014
Field of study

A book on compressed data structures

Crossref

Archivio della Ricerca - Università di Pisa

CERN Document Server

Compressed cache-oblivious string B-tree

Author: VENTURINI ROSSANO
FERRAGINA PAOLO
Publication venue
Publication date: 01/01/2016
Field of study

In this article, we study three variants of the well-known prefix-search problem for strings, and we design solutions for the cache-oblivious model which improve the best known results. Among these contributions, we close (asymptotically) the classic problem, which asks for the detection of the set of strings that share the longest common prefix with a queried pattern by providing an I/O-optimal solution that matches the space lower bound for tries up to a constant multiplicative factor of the form (1 + ∈), for ∈ > 0. Our solutions hinge upon a novel compressed storage scheme that adds the ability to decompress prefixes of the stored strings I/O-optimally to the elegant locality-preserving front coding (Bender et al. 2006) still preserving its space bounds

Crossref

Archivio della Ricerca - Università di Pisa

On a simple storage scheme for strings achieving entropy bounds

Author: Ferragina Paolo
VENTURINI ROSSANO
Venturini Rossano
FERRAGINA PAOLO
Publication venue
Publication date: 27/06/2006
Field of study

In this note we propose a storage scheme for a string S[1,n], drawn from an alphabet A, that requires space close to the k-th order empirical entropy of S and allows to retrieve any L-long substring of S in optimal O(1 + L log A/log n) time. This matches the best known bounds via the only use of binary encodings and tables

Elsevier - Publisher Connector

Archivio della Ricerca - Università di Pisa

UnipiEprints

Space-Efficient Substring Occurrence Estimation

Author: VENTURINI ROSSANO
Orlandi Alessio
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we study the problem of estimating the number of occurrences of substrings in textual data: A text T on some alphabet Σ=[σ] of length n is preprocessed and an index I is built. The index is used in lieu of the text to answer queries of the form Count≈(P), returning an approximated number of the occurrences of an arbitrary pattern

P

P as a substring of T. The problem has its main application in selectivity estimation related to the LIKE predicate in textual databases. Our focus is on obtaining an algorithmic solution with guaranteed error rates and small footprint. To achieve that, we first enrich previous work in the area of compressed text-indexing providing an optimal data structure that, for a given additive error l≥1, requires Θnllogσ bits. We also approach the issue of guaranteeing exact answers for sufficiently frequent patterns, providing a data structure whose size scales with the amount of such patterns. Our theoretical findings are supported by experiments showing the practical impact of our data structures

Archivio della Ricerca - Università di Pisa

System and method for string processing and searching using a compressed permuterm index

Author: VENTURINI ROSSANO
FERRAGINA PAOLO
Publication venue
Publication date: 01/01/2007
Field of study

An improved system and method for string processing and searching using a compressed permuterm index is provided. To build a compressed permuterm index for a string dictionary, an index builder constructs a unique string from a collection of strings of a dictionary sorted in lexicographic order and then builds a compressed permuterm index to support queries over the unique string. A dictionary query engine supports several types of wild-card queries over the string dictionary by performing a backward search modified with a CyclicLF operation over the compressed permuterm index. These queries may used to implement other queries including a membership query, a prefix query, a suffix query, a prefix-suffix query, a query for an exact or substring match, a rank query, a select query and so forth. String processing and searching tasks may accurately be performed for sophisticated queries in optimal time and compressed space

Archivio della Ricerca - Università di Pisa

On Compact Representations of All-Pairs-Shortest-Path-Distance Matrices

Author: VENTURINI ROSSANO
NITTO IGOR
Publication venue
Publication date: 01/01/2008
Field of study

Archivio della Ricerca - Università di Pisa

VSEncoding: Efficient Coding and Fast Decoding of Integer Lists via Dynamic Programming

Author: VENTURINI ROSSANO
Silvestri Fabrizio
Publication venue
Publication date: 01/01/2010
Field of study

Archivio della Ricerca - Università di Pisa

Compressed String Dictionary Search with Edit Distance One

Author: VENTURINI ROSSANO
Belazzougui Djamal
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern (Formula presented.) , the index has to report all the strings in the dictionary having edit distance at most one with (Formula presented.). Our first solution is able to solve queries in (almost optimal) (Formula presented.) time where (Formula presented.) is the number of strings in the dictionary having edit distance at most one with (Formula presented.). The space complexity of this solution is bounded in terms of the (Formula presented.) th order entropy of the indexed dictionary. A second solution further improves this space complexity at the cost of increasing the query time. Finally, we propose randomized solutions (Monte Carlo and Las Vegas) which achieve simultaneously the time complexity of the first solution and the space complexity of the second one

Crossref

Archivio della Ricerca - Università di Pisa

Space-efficient substring occurrence estimation

Author: VENTURINI ROSSANO
Rossano Venturini
Alessio Orlandi
Publication venue
Publication date: 01/01/2011
Field of study

Crossref

Archivio della Ricerca - Università di Pisa

Distribution-aware compressed full-text indexes

Author: VENTURINI ROSSANO
FERRAGINA PAOLO
J. Siren
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we address the problem of building a compressed self-index that, given a distribution for the pattern queries and a bound on the space occupancy, minimizes the expected query time within that index space bound. We solve this problem by exploiting a reduction to the problem of finding a minimum weight K-link path in a properly designed Directed Acyclic Graph. Interestingly enough, our solution can be used with any compressed index based on the Burrows-Wheeler transform. Our experiments compare this optimal strategy with several other known approaches, showing its effectiveness in practice

Crossref

Archivio della Ricerca - Università di Pisa