1,721,103 research outputs found
Developments in Language Theory, 9th International Conference, DLT 2005, Palermo, Italy, July 2005, Proceedings
From words to pictures: Row-column combinations and Chomsky-Schützenberger theorem
The row-column combination RCC maps two (word) languages over the same alphabet onto the set of rectangular arrays, i.e., pictures, such that each row/column is a word of the first/second language. The resulting array is thus a crossword of the component words. Depending on the family of the components, different picture (2D) language families are obtained: e.g., the well-known tiling-system recognizable languages are the alphabetic projection of the crossword of local (regular) languages. We investigate the effect of the RCC operation especially when the components are context-free, also with application of an alphabetic projection. The resulting 2D families are compared with others defined in the past. The classical characterization of context-free languages, known as Chomsky-Schützenberger theorem, is extended to the crosswords in this way: the projection of a context-free crossword is equivalent to the projection of the intersection of a 2D Dyck language and the crossword of strictly locally testable language. The definition of 2D Dyck language relies on a new more flexible so-called Cartesian RCC operation on Dyck languages. The proof involves the version of the Chomsky-Schützenberger theorem that is non-erasing and uses a grammar-independent alphabet
Word assembly through minimal forbidden words
We give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis
Block Sorting-Based Transformations on Words: Beyond the Magic BWT
The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the transformations in this class can be used as booster for memoryless compressors and we provide an upper bound on the number of equal-letter runs in their output. Moreover, we introduce the notion of rank-invertibility, a property related to the implementation of an efficient inversion procedure. We show that the BWT and the Alternating BWT are the only rank-invertible transformations in the class we have defined
A combinatorial view on string attractors
The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w=w_1w_2...w_n is a subset Γ of the positions {1,2,...,n}, such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ.
In this paper we explore the notion of string attractor by focusing on its combinatorial properties. In particular, we show how the size of the smallest string attractor of a word varies when combinatorial operations are applied and we deduce that such a measure is not monotone. Moreover, we introduce a circular variant of the notion of string attractor to provide a characterization of the conjugacy classes of standard Sturmian words
- …
