1,720,967 research outputs found
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)
Background: Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description: Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion: We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein
Drawing the tree of eukaryotic life based on the analysis of 2,269 manually annotated myosins from 328 species
diArk – a resource for eukaryotic genome research
Abstract Background The number of completed eukaryotic genome sequences and cDNA projects has increased exponentially in the past few years although most of them have not been published yet. In addition, many microarray analyses yielded thousands of sequenced EST and cDNA clones. For the researcher interested in single gene analyses (from a phylogenetic, a structural biology or other perspective) it is therefore important to have up-to-date knowledge about the various resources providing primary data. Description The database is built around 3 central tables: species, sequencing projects and publications. The species table contains commonly and alternatively used scientific names, common names and the complete taxonomic information. For projects the sequence type and links to species project web-sites and species homepages are stored. All publications are linked to projects. The web-interface provides comprehensive search modules with detailed options and three different views of the selected data. We have especially focused on developing an elaborate taxonomic tree search tool that allows the user to instantaneously identify e.g. the closest relative to the organism of interest. Conclusion We have developed a database, called diArk, to store, organize, and present the most relevant information about completed genome projects and EST/cDNA data from eukaryotes. Currently, diArk provides information about 415 eukaryotes, 823 sequencing projects, and 248 publications.</p
Reconstructing the phylogeny of 21 completely sequenced arthropod species based on their motor proteins
Abstract Background Motor proteins have extensively been studied in the past and consist of large superfamilies. They are involved in diverse processes like cell division, cellular transport, neuronal transport processes, or muscle contraction, to name a few. Vertebrates contain up to 60 myosins and about the same number of kinesins that are spread over more than a dozen distinct classes. Results Here, we present the comparative genomic analysis of the motor protein repertoire of 21 completely sequenced arthropod species using the owl limpet Lottia gigantea as outgroup. Arthropods contain up to 17 myosins grouped into 13 classes. The myosins are in almost all cases clear paralogs, and thus the evolution of the arthropod myosin inventory is mainly determined by gene losses. Arthropod species contain up to 29 kinesins spread over 13 classes. In contrast to the myosins, the evolution of the arthropod kinesin inventory is not only determined by gene losses but also by many subtaxon-specific and species-specific gene duplications. All arthropods contain each of the subunits of the cytoplasmic dynein/dynactin complex. Except for the dynein light chains and the p150 dynactin subunit they contain single gene copies of the other subunits. Especially the roadblock light chain repertoire is very species-specific. Conclusion All 21 completely sequenced arthropods, including the twelve sequenced Drosophila species, contain a species-specific set of motor proteins. The phylogenetic analysis of all genes as well as the protein repertoire placed Daphnia pulex closest to the root of the Arthropoda. The louse Pediculus humanus corporis is the closest relative to Daphnia followed by the group of the honeybee Apis mellifera and the jewel wasp Nasonia vitripennis. After this group the rust-red flour beetle Tribolium castaneum and the silkworm Bombyx mori diverged very closely from the lineage leading to the Drosophila species.</p
Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species
Background: For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by means of conventional tools is error prone, and cannot be completed without manual inspection, which is time consuming and requires considerable experience. Results: Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes ( intra species search). For cross- species annotation, Scipio was used to annotate 40 genes from Homo sapiens in the primates Pongo pygmaeus abelii and Callithrix jacchus. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate. Conclusion: Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species
WebScipio: An online tool for the determination of gene structures using protein sequences
Abstract Background Obtaining the gene structure for a given protein encoding gene is an important step in many analyses. A software suited for this task should be readily accessible, accurate, easy to handle and should provide the user with a coherent representation of the most probable gene structure. It should be rigorous enough to optimise features on the level of single bases and at the same time flexible enough to allow for cross-species searches. Results WebScipio, a web interface to the Scipio software, allows a user to obtain the corresponding coding sequence structure of a here given a query protein sequence that belongs to an already assembled eukaryotic genome. The resulting gene structure is presented in various human readable formats like a schematic representation, and a detailed alignment of the query and the target sequence highlighting any discrepancies. WebScipio can also be used to identify and characterise the gene structures of homologs in related organisms. In addition, it offers a web service for integration with other programs. Conclusion WebScipio is a tool that allows users to get a high-quality gene structure prediction from a protein query. It offers more than 250 eukaryotic genomes that can be searched and produces predictions that are close to what can be achieved by manual annotation, for in-species and cross-species searches alike. WebScipio is freely accessible at http://www.webscipio.org.</p
GenePainter: a fast tool for aligning gene structures of eukaryotic protein families, visualizing the alignments and mapping gene structures onto protein structures
Background: All sequenced eukaryotic genomes have been shown to possess at least a few introns. This includes those unicellular organisms, which were previously suspected to be intron-less. Therefore, gene splicing must have been present at least in the last common ancestor of the eukaryotes. To explain the evolution of introns, basically two mutually exclusive concepts have been developed. The introns-early hypothesis says that already the very first protein-coding genes contained introns while the introns-late concept asserts that eukaryotic genes gained introns only after the emergence of the eukaryotic lineage. A very important aspect in this respect is the conservation of intron positions within homologous genes of different taxa. Results: GenePainter is a standalone application for mapping gene structure information onto protein multiple sequence alignments. Based on the multiple sequence alignments the gene structures are aligned down to single nucleotides. GenePainter accounts for variable lengths in exons and introns, respects split codons at intron junctions and is able to handle sequencing and assembly errors, which are possible reasons for frame-shifts in exons and gaps in genome assemblies. Thus, even gene structures of considerably divergent proteins can properly be compared, as it is needed in phylogenetic analyses. Conserved intron positions can also be mapped to user-provided protein structures. For their visualization GenePainter provides scripts for the molecular graphics system PyMol
Establishment and Characterisation of an in vitro Replication System with Human Cell Extracts
In the work presented, I was able to characterise several aspects of an in vitro DNA replication system with human cell extracts.I could confirm that plasmids without special sequence characteristics are replicated by the system and that the replication of each template takes place only once, resembling the way genomic DNA is replicated in vivo. The occurrence of different kinds of replication intermediates was shown by electron microscopy, and the fate of the template DNA during the reaction was clarified. I also demonstrated that only a specific form of DNA can serve as substrate and that the products of the reaction can be separated by differential digest in a reasonable manner. Furthermore I was able to proof that the factors responsible for replication in vitro are the the same ones driving the reaction in the cell. It could also be demonstrated that the efficiency of the reaction depends on the cell cycle stage of the cells the protein extracts were prepared from. At last the reaction could be inhibited by depletion of ORC proteins, although the inhibition could not be reverted by the addition of recombinantly expressed ORC complexes.In conclusion this work is a contribution towards the complete characterisation of an in vitro replication assay as a model for the replication of the genome in human cells.publishe
Genomik und Abstammungsgeschichte von Motorproteinen: Werkzeuge und Analysen
Die vorliegende Arbeit ist eine
Zusammenstellung mehrerer Projekte, die sich zum Großteil mit
genomischen und phylogenetischen Aspekten vom Myosinen befassen.
Zwei Datenbank-basierte Web-Applikationen werden beschrieben. Die
erste, CyMoBase (www.cymobase.org) enthält Informationen zu
verschiedenen Aspekten von Motor- und Cytoskelett-Proteinen. Die
zweite, diArk (www.diark.org) enthält Informationen zu mehreren
hundert Spezies und Genom-Sequenzierprojekten und stellt eine große
Anzahl von Genomensequenzen zur Verfügung. In einer umfangreichen
Analyse der Phylogenie von 2296 Myosin-Sequenzen konnten wir den
Baum des Lebens mit 328 Spezies bestimmen. Diese Studie liefert
auch ein universelles Klassifizierungs-System für Myosine, welches
derzeit 35 Klassen umfasst. In einer Studie zum differenziellen
Splicing von Insekten-Myosinen beschreiben wir die Diversitär der
Genprodukte und einen potentiell neuen Mechanismus des
differenziellen Splicings. Des weiteren konnten wir aufgrund von
Sequenzen von Myosin, Kinesin und Dynein die phylogenetischen
Verhältnisse von 21 Insekten-Spezies mit hoher Genauigkeit
bestimmen. Die Arbeit beschreibt ebenso Software zur Bestimmung von
Gen-Strukturen (Scipio und WebScipio, www.webscipio.org) und zur
Vorhersage von Festkörper-NMR-Spektren von Proteinen.This work is a collection of several
projects that are mainly concerned with the genomics and the
phylogeny of Myosins. Two database driven web applications are
described. The first, CyMoBase (www.cymobase.org) contains
information about different aspects of motor- and cytoskeletal
proteins. The second, diArk (www.diark.org) contains information
about several hundred species and genome sequencing projects and
provides a large number of eukaryotic genome sequences. In a large
scale phylogenetic analysis using 2269 myosin protein sequences we
were able to draw the tree of life containing 328 species. This
analysis also provides a universal classification scheme for
Myosins, currently encompassing 35 classes. In a study about
differential splicing in insect Myosins we describe the diversity
of gene products and a possibly novel mechanism of splicing.
Additionally, using sequences from Myosin, Kinesin and Dynein, we
were able to determine the phylogeny of 21 insect species in
detail. The work also describes software for gene structure
determination (Scipio and WebScipio, www.webscipio.org) and
prediction of solid state NMR spectra of proteins
- …
