1,721,038 research outputs found

    Reorganization and merging of the EMBL and GenBank keyword indexes in a tree structure for more efficient retrieval of nucleic acid sequences.

    No full text
    EMBL and GenBank keyword indexes have no hierarchical structure. In this paper we present a method for merging and reorganizing them in a tree structure whose primary roots are the keywords 'protein', 'DNA', 'RNA', and 'unclassified'. Synonymous keywords have been grouped together and erroneous keywords have been corrected. This taxonomic organization of keywords results in a more extensive and efficient retrieval which is further aided by "synonyms declaration". The tree has been produced using the computer programs GENPOINT and CREANET

    Primates and mouse NumtS in the UCSC Genome Browser

    No full text
    Abstract BACKGROUND: NumtS (Nuclear MiTochondrial Sequences) are mitochondrial DNA sequences that, after stress events involving the mitochondrion, colonized the nuclear genome. Accurate mapping of NumtS avoids contamination during mtDNA PCR amplification, thus supplying reliable bases for detecting false heteroplasmies. In addition, since they commonly populate mammalian genomes (especially primates) and are polymorphic, in terms of presence/absence and content of SNPs, they may be used as evolutionary markers in intra- and inter-species population analyses. RESULTS: The need for an exhaustive NumtS annotation led us to produce the Reference Human NumtS compilation, followed, as reported in this paper, by those for chimpanzee, rhesus macaque and mouse ones. Identification of NumtS inside the UCSC Genome Browser and their inter-species comparison required the design and the implementation of NumtS tracks, starting from the compilation data. NumtS retrieval through the UCSC Genome Browser, in the species examined, is now feasible at a glance. CONCLUSIONS: Analyses involving NumtS tracks, together with other genome element tracks publicly available at the UCSC Genome Browser, can provide deep insight into genome evolution and comparative genomics, thus improving studies dealing with the mechanisms that drove the generation of NumtS. In addition, the NumtS tracks constitute a useful tool in the design of mitochondrial DNA primers

    Linguistic analysis of nucleotide sequences: Algorithms fdr pattern recognition and analysis of codon strategy

    No full text
    The linguistic approach to the analysis of nucleotide sequences reveals a powerful tool for a number of purposes, including the identification of sequence motifs having a functional role, the establishment of functional correlations between strings, and the study of phylogenetic relationships between genetic texts (i.e., evolutionary analyses). Linguistic approaches to the analysis of genetic material are numerous and differ according to the particular goal of the study. This chapter presents a short introduction to the commonest aspects and treatments of nucleotide sequences as a language and two algorithms of linguistic analysis. The algorithm WORDUP is aimed at the identification of statistically significant oligonucleotide motifs. Such a method is, particularly suitable to the analysis of a huge number of sequences having unknown functions produced by automatic sequencing procedures. The algorithm CODONTREE is aimed at the study of codon strategy in protein coding genes

    Mining Information Extraction Models for HmtDB annotation

    No full text
    Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a real-world dataset composed by publications selected to support biologists in the annotation of the HmtDB database

    LINGUISTIC APPROACHES TO THE ANALYSIS OF SEQUENCE INFORMATION

    No full text
    Biological macromolecules have many features that resemble modern languages. Thus, linguistic approaches to the analysis of sequence information are becoming powerful tools for deciphering genetic texts. The methodologies used, to date, to determine the global parameters of the genetic language and meaningful patterns within it are described

    RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences

    No full text
    Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence. Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA. Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method

    A statistical method for detecting regions with different evolutionary dynamics in multialigned sequences.

    No full text
    We describe a stochastic method for tracing the evolutionary pattern of multialigned sequences. This method allows us to detect gene regions with distinct evolutionary dynamics, e. g., regions that significantly deviate from the expected behavior. Accurate detection of hypervariable or hyperconstrained regions may provide useful information on the structure/function relationship of biosequences. This information can help localize functional constraints. In addition, the selection of distinct evolutionary dynamics may assist in the correct use of biosequences as reliable molecular clocks. (c) 1992 Academic Press, Inc
    corecore