Search CORE

1,721,012 research outputs found

Pattern Discovery in the Post-Genome

Author: GIANCARLO Raffaele
A. APOSTOLICO
Publication venue
Publication date: 01/01/2005
Field of study

Archivio istituzionale della ricerca - Università di Palermo

Monotone scoring of patterns with mismatches

Author: PIZZI CINZIA
A. APOSTOLICO
Publication venue
Publication date: 01/01/2004
Field of study

We study the problem of extracting, from given source x and error threshold k, substrings of x that occur unusually often in x within k substitutions or mismatches. Specifically, we assume that the input textstring x of n characters is produced by an i.i.d. source, and design efficient methods for computing the probability and expected number of occurrences for substrings of x with (either exactly or up to) k mismatches. Two related schemes are presented. In the first one, an O(nk) time preprocessing of x is developed that supports the following subsequent queries: for any substring w of x arbitrarily specified as input, the probability of occurrence of w in x within (either exactly or up to) k mismatches is reported in O(k(2)) time. In the second scheme, a length or length range is arbitrarily specified, and the above probabilities are computed for all substrings of x having length in that range, in overall O(nk) time. Further, monotonicity conditions are introduced and studied for probabilities and expected occurrences of a substring under unit increases in its length, allowed number of errors, or both. Over intervals of constant frequency count, these monotonicities translate to some of the scores in use, thereby reducing the size of tables at the outset and enhancing the process of discovery. These latter derivations extend to patterns with mismatches an analysis previously devoted to exact patterns

Crossref

Archivio istituzionale della ricerca - Università di Padova

Method and structure for lossy compression of continuous data with extensible motifs

Author: COMIN MATTEO
PARIDA LAXMI
A. APOSTOLICO
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Padova

Methods and systems for conservative extraction of over-represented extensible motifs

Author: COMIN MATTEO
A. APOSTOLICO
L. PARIDA
Publication venue
Publication date: 01/01/2011
Field of study

Archivio istituzionale della ricerca - Università di Padova

On the Monotonicity of the String Correction Factor for Words with Mismatches

Author: Apostolico Alberto
PIZZI CINZIA
A. APOSTOLICO
Publication venue
Publication date: 01/01/2006
Field of study

The string correction factor is the term by which the probability of a word w needs to be multiplied in order to account for character changes or “errors” occurring in at most k arbitrary positions in that word. The behavior of this factor, as a function of k and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi- tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over- represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under i.i.d. probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open

DROPS Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università di Padova

Special Issue with selected papers of RECOMB, Tenth Annual International Conference on Researchin Computational Molecular Biology

Author: G. LANCIA
GUERRA CONCETTINA
A. APOSTOLICO
R. GIANCARLO
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Padova

The set union problem with unlimited backtracking

Author: ITALIANO G
M. TALAMO
G. GAMBOSI
A. APOSTOLICO
Publication venue
Publication date: 01/01/1994
Field of study

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Optimal discovery of subword associations in strings

Author: PIZZI CINZIA
Cinzia Pizzi
Giorgio Satta
Alberto Apostolico
SATTA GIORGIO
A. APOSTOLICO
Publication venue
Publication date: 01/01/2004
Field of study

Given a textstring x of n symbols and an integer constant d, we consider the problem of finding, for any pair (y, z) of subwords of x the number of times that y and z occur in tandem (i.e., with no intermediate occurrence of either one of them) within a distance of d symbols of x. Although in principle there might be n(4) distinct subword pairs in x, we show that it suffices to consider a family of only n(2) such pairs, with the property that for any neglected pair (y', z'), there is a corresponding pair (y, z) contained in our family and such that: (i) y' is a prefix of y and z' is a prefix of z, and (ii) the tandem index of (y', z') equals that of (y, z). We show that an algorithm for the construction of the table of all such tandem indices can be built to run in optimal O(n(2)) time and space

Crossref

Archivio istituzionale della ricerca - Università di Padova

Alignment Free Sequence Similarity with Bounded Hamming Distance

Author: C. Guerra
PIZZI CINZIA
A. Apostolico
Cinzia Pizzi
Concettina Guerra
Alberto Apostolico
Publication venue
Publication date: 01/01/2014
Field of study

A growing number of measures of sequence similarity is being based on some underlying notion of relative compressibility. Within this paradigm, similar sequences are expected to share a large number of common substrings, or subsequences, or more complex patterns or motifs, and so on. The computational complexity of such measures varies, and it increases with the complexion of the patterns taken into account. At the low end of the spectrum, most measures based on the bags of shared substrings are typically afforded in linear time. This performance is no longer achievable as soon as some degree of distortion is accepted. In this paper, measures of sequence similarity are introduced and studied in which patterns in a pair are considered similar if they coincide up to a preset number of mismatches, that is, within a bounded Hamming distance. It is shown here that for some such measures bounds are achievable that are slightly better than O(n^2). Preliminary experiments demonstrate the potential applicability to phylogeny and classification of similarity measures that are rougher than previously adopted ones

Crossref

Archivio istituzionale della ricerca - Università di Padova

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)