1,721,009 research outputs found

    Multi-dimensional sparse time series: feature extraction

    No full text
    Multimedia sequential data represent the behavior of multiple measurements on some process and may be analyzed as multi-dimensional time series via entropy and statistical linguistic techniques. We introduce three markers: influence area, consistency and diversification. The former two refer to the quality of the dynamic change of the data with time; the last one measures the variability of recurrent patterns. These markers are useful in classification or clustering of large databases, prediction of future behavior and attribution of new data. We show an application concerning different investment strategies in purchasing commercials in advertising market

    A compression-based approach for coding sequences identification in prokaryotic genomes

    No full text
    To identify coding regions in genomic sequences represents the first step toward further analysis of the biological function carried on by the different functional elements in a genome. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitable defined compression index of a DNA sequence. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, discovering that this approach can fail in the presence of highly-structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences, etc.)

    A TASTE OF YEAST MOBILOMICS

    No full text
    Mobilomics calls for detecting all the mobile elements in a genome so as to understand their dynamic behavior. We devise and apply a method that extends a pairwise strain comparison tool for mobile genetic elements (MGE) inference, and perform experiments on a whole dataset of 39 complete genomes of as many yeast (S.cerevisiae) strains. We locate a priori all the MGEs regions that are annotated in the reference sequence at hand, and map all the putative MGEs in all the other (non-annotated) strains. Interestingly, evolutionary relation among the strains based on the presence/absence of candidate MGEs, turns out to be quite close to that inferred by classic phylogenetic methods based on SNPs analysis

    INFERRING MOBILE ELEMENTS IN S. CEREVISIAE STRAINS

    No full text
    We aim at finding all the mobile elements in a genome and understanding their dynamic behavior. Comparative genomics of closely related organisms can provide the data for this kind of investigation. The comparison task requires a huge amount of computational resources, which in our approach we alleviate by exploiting the high similarity between homologous chromosomes of different strains of the same species. Our case study is for Ref Seq and two other strains of S. cerevisiæ. Our fast algorithm, called REGENDER, is driven by data analysis. We found that almost all the chromosomes are composed by resident genome (more than 90% is conserved). Most importantly, the inspection of the non-conserved regions revealed that these are putative mobile elements, thus confirming that our method is useful to quickly find mobile elements. The software tool REGENDER is available online

    A top-down linguistic approach to the analysis of genomic sequences: The metabotropic Glutamate receptors 1 and 5 in Human and in Mouse as a case study.

    No full text
    This paper presents a top-down strategy to detect features in genomic sequences. The strategy's core is to exploit dictionary-based compression algorithms and analyse the content of the automatically generated dictionary. We classify the different over-represented segments and in the case study we correlate them to experimentally identified or theoretically forecasted biological features. A large spectrum analysis reveals that the only feature co-located with the a priori extracted segments is the torsional flexibility of DNA, while non-B DNA configurations are anti-localized and other features are mostly independent of the extracted sequences. This analysis unravels complex relationships between the linguistic structures investigated under our approach and some known biological features. (C) 2010 Elsevier Ltd. All rights reserved

    Computable information content and boolean networks dynamics

    No full text
    We propose the application of information content to analyze the time and space evolution of some boolean networks, and to give a classification of their state as either solid, jelly, liquid, or gaseous, following the ideas of Kauffman. To these aims both individual nodes and the global network are studied by means of data compression. Some experiments are presented using the compression algorithm CASToRe, developed along Lempel-Ziv algorithms
    corecore