Search CORE

1,721,209 research outputs found

Generating a Gray code for prefix normal words in amortized polylogarithmic time per word

Author: Sawada J.
Raman R.
Liptak Z.
Fici G.
Burcsi P.
Publication venue
Publication date: 01/01/2020
Field of study

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. By proving that the set of prefix normal words is a bubble language, we can exhaustively list all prefix normal words of length n as a combinatorial Gray code, where successive strings differ by at most two swaps or bit flips. This Gray code can be generated in O(log2⁡n) amortized time per word, while the best generation algorithm hitherto has O(n) running time per word. We also present a membership tester for prefix normal words, as well as a novel characterization of bubble languages

Archivio istituzionale della ricerca - Università di Palermo

r-Indexing the eBWT

Author: Cenzato D.
Rossi M.
Sciortino M.
Boucher C.
Liptak Z.
Publication venue
Publication date: 01/01/2021
Field of study

The extended Burrows Wheeler Transform (eBWT ) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the BWT to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the eBWT that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to r-indexing the eBWT, i.e., run-length encoded eBWT and SA samples of Gagie et al. [SODA 2018] can be constructed efficiently from the components of the PFP. Moreover, we show that finding maximal exact matches (MEMs) between a query string and the r-index of the eBWT can be efficiently supported

Archivio istituzionale della ricerca - Università di Palermo

Computing the Original eBWT Faster, Simpler, and with Less Memory

Author: Cenzato D.
Rossi M.
Sciortino M.
Boucher C.
Liptak Z.
Publication venue
Publication date: 01/01/2021
Field of study

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings. However, since this introduction, it has been used more generally to describe any BWT of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our new eBWT construction with a variation of prefix-free parsing to allow for scalable construction of the eBWT. We evaluate our algorithm (pfpebwt) on sets of human chromosomes 19, Salmonella, and SARS-CoV2 genomes, and demonstrate that it is the fastest method for all collections, with a maximum speedup of 7.6 × on the second best method. The peak memory is at most 2 × larger than the second best method. Comparing with methods that are also, as our algorithm, able to report suffix array samples, we obtain a 57.1 × improvement in peak memory. The source code is publicly available at https://github.com/davidecenzato/PFP-eBWT

Archivio istituzionale della ricerca - Università di Palermo

BWT for String Collections

Author: Cenzato D.
Sciortino M.
Pisanti N.
Rosone G.
Liptak Z.
Publication venue
Publication date: 01/01/2025
Field of study

We survey the different methods used for extending the BWT to collections of strings, following largely [Cenzato and Lipták, CPM 2022, Bioinformatics 2024]. We analyze the specific aspects and combinatorial properties of the resulting BWT variants and give a categorization of publicly available tools for computing the BWT of string collections. We show how the specific method used impacts on the resulting transform, including the number of runs, and on the dynamicity of the transform with respect to adding or removing strings from the collection. We then focus on the number of runs of these BWT variants and present the optimal BWT introduced in [Cenzato et al., DCC 2023], which implements an algorithm originally proposed by [Bentley et al., ESA 2020] to minimize the number of BWT-runs. We also discuss several recent heuristics and study their impact on the compression of biological sequences. We conclude with an overview of the applications and the impact of the BWT of string collections in bioinformatics

Archivio istituzionale della ricerca - Università di Palermo

Bit Catastrophes for the Burrows-Wheeler Transform

Author: Sciortino M.
Inenaga S.
Romana G.
Liptak Z.
Giuliani S.
Urbina C.
Publication venue
Publication date: 01/01/2023
Field of study

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021]

Archivio istituzionale della ricerca - Università di Palermo

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Author: Prezza N.
Sciortino M.
Inenaga S.
Toffanello A.
Liptak Z.
Giuliani S.
Publication venue
Publication date: 01/01/2021
Field of study

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as r) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the r parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures. In this paper, we investigate ρ(v), the ratio of r and of the number of runs of the BWT of the reverse of v. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as ρ(v) = O(log 2(n) ), for any string v of length n. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which ρ(v) = Θ(log n) holds, thus giving the first non-trivial lower bound on ρ(n), the maximum over all strings of length n. Our results suggest that r is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string’s combinatorial properties

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio istituzionale della ricerca - Università di Palermo

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Appropriate Similarity Measures for Author Cocitation Analysis

Author: Waltman L.R.
Eck N.J.P. van
Publication venue
Publication date
Field of study

We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authorsâ€™ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

Research Papers in Economics