1,721,038 research outputs found
Reorganization and merging of the EMBL and GenBank keyword indexes in a tree structure for more efficient retrieval of nucleic acid sequences.
EMBL and GenBank keyword indexes have no hierarchical structure. In this paper we present a method for merging and reorganizing them in a tree structure whose primary roots are the keywords 'protein', 'DNA', 'RNA', and 'unclassified'. Synonymous keywords have been grouped together and erroneous keywords have been corrected. This taxonomic organization of keywords results in a more extensive and efficient retrieval which is further aided by "synonyms declaration". The tree has been produced using the computer programs GENPOINT and CREANET
Primates and mouse NumtS in the UCSC Genome Browser
Abstract
BACKGROUND:
NumtS (Nuclear MiTochondrial Sequences) are mitochondrial DNA sequences that, after stress events involving the mitochondrion, colonized the nuclear genome. Accurate mapping of NumtS avoids contamination during mtDNA PCR amplification, thus supplying reliable bases for detecting false heteroplasmies. In addition, since they commonly populate mammalian genomes (especially primates) and are polymorphic, in terms of presence/absence and content of SNPs, they may be used as evolutionary markers in intra- and inter-species population analyses.
RESULTS:
The need for an exhaustive NumtS annotation led us to produce the Reference Human NumtS compilation, followed, as reported in this paper, by those for chimpanzee, rhesus macaque and mouse ones. Identification of NumtS inside the UCSC Genome Browser and their inter-species comparison required the design and the implementation of NumtS tracks, starting from the compilation data. NumtS retrieval through the UCSC Genome Browser, in the species examined, is now feasible at a glance.
CONCLUSIONS:
Analyses involving NumtS tracks, together with other genome element tracks publicly available at the UCSC Genome Browser, can provide deep insight into genome evolution and comparative genomics, thus improving studies dealing with the mechanisms that drove the generation of NumtS. In addition, the NumtS tracks constitute a useful tool in the design of mitochondrial DNA primers
Linguistic analysis of nucleotide sequences: Algorithms fdr pattern recognition and analysis of codon strategy
The linguistic approach to the analysis of nucleotide sequences reveals a powerful tool for a number of purposes, including the identification of sequence motifs having a functional role, the establishment of functional correlations between strings, and the study of phylogenetic relationships between genetic texts (i.e., evolutionary analyses). Linguistic approaches to the analysis of genetic material are numerous and differ according to the particular goal of the study. This chapter presents a short introduction to the commonest aspects and treatments of nucleotide sequences as a language and two algorithms of linguistic analysis. The algorithm WORDUP is aimed at the identification of statistically significant oligonucleotide motifs. Such a method is, particularly suitable to the analysis of a huge number of sequences having unknown functions produced by automatic sequencing procedures. The algorithm CODONTREE is aimed at the study of codon strategy in protein coding genes
Mining Information Extraction Models for HmtDB annotation
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a real-world dataset composed by publications selected to support biologists in the annotation of the HmtDB database
STRUCTURAL ELEMENTS HIGHLY PRESERVED DURING THE EVOLUTION OF THE D-LOOP-CONTAINING REGION IN VERTEBRATE MITOCHONDRIAL-DNA
LINGUISTIC APPROACHES TO THE ANALYSIS OF SEQUENCE INFORMATION
Biological macromolecules have many features that resemble modern languages. Thus, linguistic approaches to the analysis of sequence information are becoming powerful tools for deciphering genetic texts. The methodologies used, to date, to determine the global parameters of the genetic language and meaningful patterns within it are described
RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences
Background: One of the most frequent uses of bioinformatics tools
concerns functional characterization of a newly produced nucleotide
sequence (a query sequence) by applying Blast or FASTA against a set of
sequences (the subject sequences).
However, in some specific contexts, it is useful to compare the query
sequence against a cluster such as a MultiAlignment (MA). We present
here the RegExpBlasting (REB) algorithm, which compares an unclassified
sequence with a dataset of patterns defined by application of Regular
Expression rules to a given-as-input MA datasets.
The REB algorithm workflow consists in
i. the definition of a dataset of multialignments
ii. the association of each MA to a pattern, defined by application of
regular expression rules;
iii. automatic characterization of a submitted biosequence according to
the function of the sequences described by the pattern best matching the
query sequence.
Results: An application of this algorithm is used in the "characterize
your sequence" tool available in the PPNEMA resource. PPNEMA is a
resource of Ribosomal Cistron sequences from various species, grouped
according to nematode genera. It allows the retrieval of plant nematode
multialigned sequences or the classification of new nematode rDNA
sequences by applying REB. The same algorithm also supports automatic
updating of the PPNEMA database. The present paper gives examples of the
use of REB within PPNEMA.
Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize
your sequence" option clearly demonstrates the power of the method.
Using REB can also rapidly solve any other bioinformatics problem, where
the addition of a new sequence to a pre-existing cluster is required.
The statistical tests carried out here show the powerful flexibility of
the method
A statistical method for detecting regions with different evolutionary dynamics in multialigned sequences.
We describe a stochastic method for tracing the evolutionary pattern of multialigned sequences. This method allows us to detect gene regions with distinct evolutionary dynamics, e. g., regions that significantly deviate from the expected behavior. Accurate detection of hypervariable or hyperconstrained regions may provide useful information on the structure/function relationship of biosequences. This information can help localize functional constraints. In addition, the selection of distinct evolutionary dynamics may assist in the correct use of biosequences as reliable molecular clocks. (c) 1992 Academic Press, Inc
- …
