1,721,096 research outputs found

    On-line construction of a small automaton for a finite set of words

    No full text
    In this paper we describe a ``light'' algorithm for the on-line construction of a small automaton recognising a finite set of words. The algorithm runs in linear time. We carried out good experimental results on the suffixes of a text, showing how this automaton is small. For the suffixes of a text, we propose a modified construction that leads to an even smaller automaton

    A Basis of Tiling Motifs for Generating Repeated Patterns and its Complexity for Higher Quorum

    No full text
    We investigate the problem of determining the basis of motifs (a form of repeated patterns with don't cares) in an input string. We give new upper and lower bounds on the problem, introducing a new notion of basis that is provably smaller than (and contained in) previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all these bases grows exponentially with the quorum, the minimal number of times a motif must appear. We show that a polynomial-time algorithm exists only for fixed quorum

    The rightmost equal-cost position problem.

    No full text
    LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor whose cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for a text of length n, at any time i, it provides REP(LPF) in constant time, where LPF is the longest previous factor, i.e. the greedy phrase, a reference to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in time Ο(|p| log log n) for any given pattern p. © 2013 IEEE

    The longest common substring problem

    No full text
    Given a set (Formula presented.) of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ⩽ k ⩽ q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000 International Journal of Computer Science and Engineering 15 73–76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees. In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree

    On the Longest Common Factor Problem

    No full text
    The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take space O(n). Our algorithm works in time O(nlog a), where n is the total input size and a is the multiplicative constant introduced by the alphabet. a is the size of the alphabet. We also consider a different version of our algorithm that applies to DAWGs. In this case, we prove that the algorithm works in both time and space proportional to data DAWG’s size
    corecore