1,721,201 research outputs found

    On balancing of a direct product

    No full text
    A direct product of two sequences is a naturally defined sequence on the alphabet of pairs of symbols. By taking inspiration from [Pavel Salimov. On uniform recurrence of a direct product. In AutoMathA, 2009], where the author investigates the case of uniformly recurrent words, here, we study when the product of two balanced sequences on binary alphabet is also balanced

    Periodicity

    No full text

    Completing Wheeler Automata

    Full text link
    We consider the problem of embedding a Wheeler Deterministic Finite Automaton (WDFA, in short) into an equivalent complete WDFA, preserving the order of states and the accepted language. In some cases, such a complete WDFA does not exist. We say that a WDFA is Wheeler-complete (W-complete, in short) if it cannot be properly embedded into an equivalent WDFA. We give an algorithm that, given as input a WDFA A, returns the smallest W-complete WDFA containing A: it is called the W-completion of A. We derive some interesting applications of this algorithm concerning the construction of a WDFA for the union and a WDFA for the complement of Wheeler languages

    Primitive sets of words

    Full text link
    Given a (finite or infinite) subset X of the free monoid A⁎ over a finite alphabet A, the rank of X is the minimal cardinality of a set F such that X⊆F⁎. We say that a submonoid M generated by k elements of A⁎ is k-maximal if there does not exist another submonoid generated by at most k words containing M. We call a set X⊆A⁎ primitive if it is the basis of a |X|-maximal submonoid. This definition encompasses the notion of primitive word — in fact, {w} is a primitive set if and only if w is a primitive word. By definition, for any set X, there exists a primitive set Y such that X⊆Y⁎. We therefore call Y a primitive root of X. As a main result, we prove that if a set has rank 2, then it has a unique primitive root. To obtain this result, we prove that the intersection of two 2-maximal submonoids is either the empty word or a submonoid generated by one single primitive word. For a single word w, we say that the set {x,y} is a bi-root of w if w can be written as a concatenation of copies of x and y and {x,y} is a primitive set. We prove that every primitive word w has at most one bi-root {x,y} such that |x|+|y|<|w|. That is, the bi-root of a word is unique provided the word is sufficiently long with respect to the size (sum of lengths) of the root. Our results are also compared to previous approaches that investigate pseudo-repetitions, where a morphic involutive function θ is defined on A⁎. In this setting, the notions of θ-power, θ-primitive and θ-root are defined, and it is shown that any word has a unique θ-primitive root. This result can be obtained with our approach by showing that a word w is θ-primitive if and only if {w,θ(w)} is a primitive set

    Burrows-Wheeler transform and palindromic richness

    No full text
    The investigation of the extremal case of the Burrows-Wheeler transform leads to study the words ww over an ordered alphabet A={a1,a2,,ak}A=\{a_1,a_2,\ldots,a_k\}, with a1<a2<<aka_1 < a_2 < \ldots <a_k, such that bwt(w)bwt(w) is of the form aknkak1nk1a2n2a1n1a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_2^{n_2}a_1^{n_1}, for some non-negative integers n1,n2,,nkn_1, n_2, \ldots, n_k. A characterization of these words in the case A=2|A|=2 has been given in~\cite{MaReSc}, where it is proved that they correspond to the powers of conjugates of standard words. The case A=3|A|=3 has been settled in [Jamie Simpson, Simon J. Puglisi, Words with simple Burrows-Wheeler transforms, Electronic Journal of Combinatorics 15, (2008) article R83], which also contains some partial results for an arbitrary alphabet. In the present paper we show that such words can be described in terms of the notion of ``palindromic richness'', recently introduced in [Amy Glen, Jacques Justin, Steve Widmer, Luca Q. Zamboni, Palindromic richness, European Journal of Combinatorics 30 (2) (2009) 510-531]. Our main result indeed states that a word ww such that bwt(w)bwt(w) has the form aknkak1nk1a2n2a1n1a_k^{n_k} a_{k-1}^{n_{k-1}} \cdots a_2^{n_2}a_1^{n_1} is strongly rich, i.~e. the word w2w^2 contains the maximum number of different palindromic factors

    Some Investigations on Similarity Measures Based on Absent Words

    Full text link
    In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common structure, but they are not able to detect the difference on the number of occurrences of such a structure in the two sequences. In order to take into account such a multiplicity, we introduce the notion of multifactor, and define a new measure that uses both absent words and multifactors. Surprisingly, we prove that this similarity measure coincides with a distance on sequences introduced by Ehrenfeucht and Haussler in [2], in the context of block-moves strategies. In this way, our result creates a non trivial bridge between similarity measures based on absent words and those based on the block-moves approach

    Balancing and clustering of words in the Burrows–Wheeler transform

    No full text
    Compression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words. The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is). This hypothesis is here corroborated by experiments on “real” text, by using local entropy as a measure of the degree of balance of a word. In the setting of Combinatorics on Words, a sound confirmation of previous hypothesis is given by a result of Mantaci et al. (2003) [27], which states that, in the case of a binary alphabet, there is an equivalence between circularly balanced words, words having a clusterized BWT, and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence. The last section of the present paper is devoted to investigate the relationships between these notions, and other related ones (as, for instance, palindromic richness) in the case of a general alphabet
    corecore