Search CORE

1,721,037 research outputs found

On balancing of a direct product

Author: Restivo A
ROSONE GIOVANNA
Publication venue
Publication date: 01/01/2009
Field of study

A direct product of two sequences is a naturally defined sequence on the alphabet of pairs of symbols. By taking inspiration from [Pavel Salimov. On uniform recurrence of a direct product. In AutoMathA, 2009], where the author investigates the case of uniformly recurrent words, here, we study when the product of two balanced sequences on binary alphabet is also balanced

Archivio della Ricerca - Università di Pisa

Space-efficient construction of compressed suffix trees

Author: Prezza Nicola
Rosone Giovanna
Publication venue
Publication date: 01/01/2021
Field of study

We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let

n

be the text length and

sigma

be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in

O(nlogsigma)

time using just

o(nlogsigma)

bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0 < epsilon leq 1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in

Oleft(n(logsigma + epsilon^{-1}cdot loglog n) ight)

time using at most

nlogsigma cdot(epsilon + o(1))

bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e.

o(nlogsigma)

bits) and

Thetaleft(nlogsigma + n(loglog n)^{1+delta} ight)

time, for any constant delta>0. This improves the previous most space-efficient algorithms, which worked in

O(n)

bits and

O(nlog n)

time. We also consider the problem of merging BWTs of string collections, and provide a solution running in

O(nlogsigma)

time and using just

o(nlogsigma)

bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as

n

bits on top of a packed representation of the input/output and process data as fast as

2.92

megabases per second

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Faster Online Computation of the Succinct Longest Previous Factor Array

Author: Nicola Prezza
Prezza Nicola
Giovanna Rosone
Rosone Giovanna
Publication venue
Publication date: 01/01/2020
Field of study

We consider the problem of computing online the Longest Previous Factor array LPF[1, n] of a text T of length n. For each, LPF[i] stores the length of the longest factor of T with at least two occurrences, one ending at i and the other at a previous position. We present an improvement over the previous solution by Okanohara and Sadakane (ESA 2008): our solution uses less space (compressed instead of succinct) and runs in time, thus being faster by a logarithmic factor. As a by-product, we also obtain the first online algorithm computing the Longest Common Suffix (LCS) array (that is, the LCP array of the reversed text) in time and compressed space. We also observe that the LPF array can be represented succinctly in 2n bits. Our online algorithm computes directly the succinct LPF and LCS arrays

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Burrows-Wheeler transform and palindromic richness

Author: Restivo A
RESTIVO Antonio
ROSONE Giovanna
Publication venue
Publication date: 01/01/2009
Field of study

The investigation of the extremal case of the Burrows-Wheeler transform leads to study the words

w

over an ordered alphabet

A=\{a_1,a_2,\ldots,a_k\}

, with

a_1 < a_2 < \ldots <a_k

, such that

bwt(w)

is of the form

a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_2^{n_2}a_1^{n_1}

, for some non-negative integers

n_1, n_2, \ldots, n_k

. A characterization of these words in the case

|A|=2

has been given in~\cite{MaReSc}, where it is proved that they correspond to the powers of conjugates of standard words. The case

|A|=3

has been settled in [Jamie Simpson, Simon J. Puglisi, Words with simple Burrows-Wheeler transforms, Electronic Journal of Combinatorics 15, (2008) article R83], which also contains some partial results for an arbitrary alphabet. In the present paper we show that such words can be described in terms of the notion of ``palindromic richness'', recently introduced in [Amy Glen, Jacques Justin, Steve Widmer, Luca Q. Zamboni, Palindromic richness, European Journal of Combinatorics 30 (2) (2009) 510-531]. Our main result indeed states that a word

w

such that

bwt(w)

has the form

a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_2^{n_2}a_1^{n_1}

is strongly rich, i.~e. the word

w^2

contains the maximum number of different palindromic factors

Elsevier - Publisher Connector

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Balancing and clustering of words in the Burrows–Wheeler transform

Author: Restivo A
ROSONE GIOVANNA
Restivo Antonio
Antonio Restivo
Giovanna Rosone
Publication venue
Publication date: 01/01/2011
Field of study

Compression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words. The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is). This hypothesis is here corroborated by experiments on “real” text, by using local entropy as a measure of the degree of balance of a word. In the setting of Combinatorics on Words, a sound confirmation of previous hypothesis is given by a result of Mantaci et al. (2003) [27], which states that, in the case of a binary alphabet, there is an equivalence between circularly balanced words, words having a clusterized BWT, and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence. The last section of the present paper is devoted to investigate the relationships between these notions, and other related ones (as, for instance, palindromic richness) in the case of a general alphabet

Elsevier - Publisher Connector

Crossref

Archivio della Ricerca - Università di Pisa

On the product of balanced sequences

Author: Restivo Antonio
RESTIVO A
Antonio Restivo
Giovanna Rosone
Rosone Giovanna
Publication venue
Publication date: 14/09/2011
Field of study

The product w = u ⊗ v of two sequences u and v is a naturally defined sequence on the alphabet of pairs of symbols. Here, we study when the product w of two balanced sequences u,v is balanced too. In the case u and v are binary sequences, we prove, as a main result, that, if such a product w is balanced and deg(w) = 4, then w is an ultimately periodic sequence of a very special form. The case of arbitrary alphabets is approached in the last section. The partial results obtained and the problems proposed show the interest of the notion of product in the study of balanced sequences

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Numérisation de Documents Anciens Mathématiques

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Balanced Words Having Simple Burrows-Wheeler Transform

Author: RESTIVO Antonio
RESTIVO A
Antonio Restivo
Giovanna Rosone
ROSONE Giovanna
Publication venue
Publication date: 01/01/2009
Field of study

The investigation of the "clustering effect" of the Burrows-Wheeler transform (BWT) leads to study the words having simple BWT , i.e. words w over an ordered alphabet

A=\{a_1,a_2,\ldots,a_k\}

, with

a_1 < a_2 < \ldots <a_k

, such that

bwt(w)

is of the form

a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_1^{n_1}

, for some non-negative integers

n_1, n_2, \ldots, n_k

. We remark that, in the case of binary alphabets, there is an equivalence between words having simple BWT, the family of (circular) balanced words and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence between these notions. As a main result of this paper we prove that, under assumption of balancing, the following three conditions on a word w are equivalent: i) w has simple BWT, ii) w is a circularly rich word, and iii) w is a conjugate of a finite epistandard word

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Sorting suffixes of a text via its Lyndon Factorization

Author: Restivo A
Sciortino M.
ROSONE GIOVANNA
Mantaci S
Publication venue
Publication date: 01/01/2013
Field of study

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can be adapted to different implementative scenarios

Archivio della Ricerca - Università di Pisa

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

Author: Sciortino Marinella
Raffaele Giancarlo
Manzini Giovanni
Restivo Antonio
Rosone Giovanna
Publication venue
Publication date: 01/01/2018
Field of study

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the transformations in this class can be used as booster for memoryless compressors and we provide an upper bound on the number of equal-letter runs in their output. Moreover, we introduce the notion of rank-invertibility, a property related to the implementation of an efficient inversion procedure. We show that the BWT and the Alternating BWT are the only rank-invertible transformations in the class we have defined

Archivio istituzionale della ricerca - Università di Palermo

Methods and systems for data analysis using the Burrows Wheeler transform

Author: ROSONE GIOVANNA
Cox Anthony James
Evers Dirk
Bauer Markus J.
Publication venue
Publication date: 01/01/2011
Field of study

The present disclosure provides computer implemented methods and systems for analyzing datasets, such as large data sets output from nucleic acid sequencing technologies. In particular, the present disclosure provides for data analysis comprising computing the BWT of a collection of strings in an incremental, character by character, manner. The present disclosure also provides compression boosting strategies resulting in a BWT of a reordered collection of data that is more compressible by second stage compression methods compared to non-reordered computational analysis

Archivio della Ricerca - Università di Pisa