1,721,020 research outputs found

    UniProt: the universal protein knowledgebase

    No full text
    Publisher's PDFThe UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.University of Delaware. Department of Computer and Information Sciences

    Update on activities at the Universal Protein Resource (UniProt) in 2013.

    No full text
    The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads

    Capturing provenance for a linkset of convenience

    No full text
    Biological interactions such as those between genes and proteins are complex and require intricate OWL models. However, direct links between biological entities can support search and data integration. In this paper we introduce linksets of convenience that capture these direct links. We show the provenance statements required to track the derivation of such linksets; linking them back to the full biological justification

    UniProt: the universal protein knowledgebase

    No full text

    The Vaginal Microbiome: Disease, Genetics and the Environment

    Full text link
    The vagina is an interactive interface between the host and the environment. Its surface is covered by a protective epithelium colonized by bacteria and other microorganisms. The ectocervix is nonsterile, whereas the endocervix and the upper genital tract are assumed to be sterile in healthy women. Therefore, the cervix serves a pivotal role as a gatekeeper to protect the upper genital tract from microbial invasion and subsequent reproductive pathology. Microorganisms that cross this barrier can cause preterm labor, pelvic inflammatory disease, and other gynecologic and reproductive disorders. Homeostasis of the microbiome in the vagina and ectocervix plays a paramount role in reproductive health. Depending on its composition, the microbiome may protect the vagina from infectious or non-infectious diseases, or it may enhance its susceptibility to them. Because of the nature of this organ, and the fact that it is continuously colonized by bacteria from birth to death, it is virtually certain that this rich environment evolved in concert with its microbial flora. Specific interactions dictated by the genetics of both the host and microbes are likely responsible for maintaining both the environment and the microbiome. However, the genetic basis of these interactions in both the host and the bacterial colonizers is currently unknown. _Lactobacillus_ species are associated with vaginal health, but the role of these species in the maintenance of health is not yet well defined. Similarly, other species, including those representing minor components of the overall flora, undoubtedly influence the ability of potential pathogens to thrive and cause disease. Gross alterations in the vaginal microbiome are frequently observed in women with bacterial vaginosis, but the exact etiology of this disorder is still unknown. There are also implications for vaginal flora in non-infectious conditions such as pregnancy, pre-term labor and birth, and possibly fertility and other aspects of women’s health. Conversely, the role of environmental factors in the maintenance of a healthy vaginal microbiome is largely unknown. To explore these issues, we have proposed to address the following questions:

*1.	Do the genes of the host contribute to the composition of the vaginal microbiome?* We hypothesize that genes of both host and bacteria have important impacts on the vaginal microbiome. We are addressing this question by examining the vaginal microbiomes of mono- and dizygotic twin pairs selected from the over 170,000 twin pairs in the Mid-Atlantic Twin Registry (MATR). Subsequent studies, beyond the scope of the current project, may investigate which host genes impact the microbial flora and how they do so.
*2.	What changes in the microbiome are associated with common non-infectious pathological states of the host?* We hypothesize that altered physiological (e.g., pregnancy) and pathologic (e.g., immune suppression) conditions, or environmental exposures (e.g., antibiotics) predictably alter the vaginal microbiome. Conversely, certain vaginal microbiome characteristics are thought to contribute to a woman’s risk for outcomes such as preterm delivery. We are addressing this question by recruiting study participants from the ~40,000 annual clinical visits to women’s clinics of the VCU Health System.
*3.	What changes in the vaginal microbiome are associated with relevant infectious diseases and conditions?* We hypothesize that susceptibility to infectious disease (e.g. HPV, _Chlamydia_ infection, vaginitis, vaginosis, etc.) is impacted by the vaginal microbiome. In turn, these infectious conditions clearly can affect the ability of other bacteria to colonize and cause pathology. Again, we are exploring these issues by recruiting participants from visitors to women’s clinics in the VCU Health System.

Three kinds of sequence data are generated in this project: i) rDNA sequences from vaginal microbes; ii) whole metagenome shotgun sequences from vaginal samples; and iii) whole genome shotgun sequences of bacterial clones selected from vaginal samples. The study includes samples from three vaginal sites: mid-vaginal, cervical, and introital. The data sets also include buccal and perianal samples from all twin participants. Samples from these additional sites are used to test the hypothesis of a per continuum spread of bacteria in relation to vaginal health. An extended set of clinical metadata associated with these sequences are deposited with dbGAP. We have currently collected over 4,400 samples from ~100 twins and over 450 clinical participants. We have analyzed and deposited data for 480 rDNA samples, eight whole metagenome shotgun samples, and over 50 complete bacterial genomes. These data are available to accredited investigators according to NIH and Human Microbiome Project (HMP) guidelines. The bacterial clones are deposited in the Biodefense and Emerging Infections Research Resources Repository ("http://www.beiresources.org/":http://www.beiresources.org/). 

In addition to the extensive sequence data obtained in this study, we are collecting metadata associated with each of the study participants. Thus, participants are asked to complete an extensive health history questionnaire at the time samples are collected. Selected clinical data associated with the visit are also obtained, and relevant information is collected from the medical records when available. This data is maintained securely in a HIPAA-compliant data system as required by VCU’s Institutional Review Board (IRB). The preponderance of these data (i.e., that judged appropriate by NIH staff and VCU’s IRB are deposited at dbGAP ("http://www.ncbi.nlm.nih.gov/gap":http://www.ncbi.nlm.nih.gov/gap). Selected fields of this data have been identified by NIH staff as ‘too sensitive’ and are not available in dbGAP. Individuals requiring access to these data fields are asked to contact the PI of this project or NIH Program Staff. 
&#xa

    Manual Curation of Vertebrate Proteins in the UniProt Knowledgebase.

    Full text link
    The UniProt Knowledgebase (UniProtKB) aims to provide the scientific community with a comprehensive, consistent and authoritative resource for protein sequence and functional information. Given the importance of human and vertebrate model data in biomedical research, a major focus is the high-quality manual curation of human proteins and their vertebrate orthologues. Manual curation involves (1) the extraction of experimental results from scientific literature to enrich protein records with a wide range of information including function, structure, interactions and subcellular location, (2) the manual verification of each sequence and clarification of discrepancies between sequence reports, and (3) the assessment of the output of a range of analysis programmes to ensure that sequence features are correctly reported. Manual curation also facilitates the standardization of experimental data – a step necessary for development of methods that enable the semi-automated transfer of manual annotation to uncharacterised or related proteins. Consequently, manual curation of vertebrate proteins plays a vital role in providing users with a complete overview of available data while ensuring its accuracy, reliability and accessibility. UniProtKB/Swiss-Prot currently contains the complete manually reviewed human proteome, comprising approximately 20’300 proteins, and an additional 61’000 reviewed entries from model vertebrates such as mouse, rat, apes, cow, chicken, zebrafish and Xenopus. Ongoing efforts continue to improve the quality of vertebrate sequences in collaboration with HAVANA, Ensembl, HGNC and RefSeq, to include new functional information as it becomes available, and to extend the coverage of curated proteins in vertebrate species. All data are freely available from "http://www.uniprot.org":www.uniprot.org

    UniProt ::the universal protein knowledgebase in 2021

    No full text
    The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/

    UniProt: the universal protein knowledgebase in 2021

    No full text
    The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/
    corecore