1,721,336 research outputs found
Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases
Genome wide association studies (GWAS) provide a powerful approach for uncovering disease-associated variants in human, but fine-mapping the causal variants remains a challenge. This is partly remedied by prioritization of disease-associated variants that overlap GWAS-enriched epigenomic annotations. Here, we introduce a new Bayesian model RiVIERA (Risk Variant Inference using Epigenomic Reference Annotations) for inference of driver variants from summary statistics across multiple traits using hundreds of epigenomic annotations. In simulation, RiVIERA promising power in detecting causal variants and causal annotations, the multi-trait joint inference further improved the detection power. We applied RiVIERA to model the existing GWAS summary statistics of 9 autoimmune diseases and Schizophrenia by jointly harnessing the potential causal enrichments among 848 tissue-specific epigenomics annotations from ENCODE/Roadmap consortium covering 127 cell/tissue types and 8 major epigenomic marks. RiVIERA identified meaningful tissue-specific enrichments for enhancer regions defined by H3K4me1 and H3K27ac for Blood T-Cell specifically in the nine autoimmune diseases and Brain-specific enhancer activities exclusively in Schizophrenia. Moreover, the variants from the 95% credible sets exhibited high conservation and enrichments for GTEx whole-blood eQTLs located within transcription-factor-binding-sites and DNA-hypersensitive-sites. Furthermore, joint modeling the nine immune traits by simultaneously inferring and exploiting the underlying epigenomic correlation between traits further improved the functional enrichments compared to single-trait models.National Institutes of Health (U.S.) (Grants R01-HG004037, RC1- HG005334, R01-HG008155 and R01 HG004037
Comparative validation of the D. melanogaster modENCODE transcriptome annotation
Accurate gene model annotation of reference genomes is critical for making them useful. The modENCODE project has improved the D. melanogaster genome annotation by using deep and diverse high-throughput data. Since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function, we have performed large-scale interspecific comparisons to increase confidence in predicted annotations. To support comparative genomics, we filled in divergence gaps in the Drosophila phylogeny by generating draft genomes for eight new species. For comparative transcriptome analysis, we generated mRNA expression profiles on 81 samples from multiple tissues and developmental stages of 15 Drosophila species, and we performed cap analysis of gene expression in D. melanogaster and D. pseudoobscura. We also describe conservation of four distinct core promoter structures composed of combinations of elements at three positions. Overall, each type of genomic feature shows a characteristic divergence rate relative to neutral models, highlighting the value of multispecies alignment in annotating a target genome that should prove useful in the annotation of other high priority genomes, especially human and other mammalian genomes that are rich in noncoding sequences. We report that the vast majority of elements in the annotation are evolutionarily conserved, indicating that the annotation will be an important springboard for functional genetic testing by the Drosophila community.National Institutes of Health (U.S.) (NIDDK (DK015600-18))National Institutes of Health (U.S.) (Extramural program (1ROIGM082843))National Institutes of Health (U.S.) (Extramural program (U01HB004271)
Systematic chromatin state comparison of epigenomes associated with diverse properties including sex and tissue type
Epigenomic data sets provide critical information about the dynamic role of chromatin states in gene regulation, but a key question of how chromatin state segmentations vary under different conditions across the genome has remained unaddressed. Here we present ChromDiff, a group-wise chromatin state comparison method that generates an information-theoretic representation of epigenomes and corrects for external covariate factors to better isolate relevant chromatin state changes. By applying ChromDiff to the 127 epigenomes from the Roadmap Epigenomics and ENCODE projects, we provide novel group-wise comparative analyses across sex, tissue type, state and developmental age. Remarkably, we find that distinct sets of epigenomic features are maximally discriminative for different group-wise comparisons, in each case revealing distinct enriched pathways, many of which do not show gene expression differences. Our methodology should be broadly applicable for epigenomic comparisons and provides a powerful new tool for studying chromatin state differences at the genome scale.National Science Foundation (U.S.). Graduate Research FellowshipNational Institutes of Health (U.S.) (U54 HG006991)National Institutes of Health (U.S.) (U41 HG007000)National Institutes of Health (U.S.) (5U01ES017156)National Institutes of Health (U.S.) (RC1-HG005334)National Institutes of Health (U.S.) (HG004570)National Institutes of Health (U.S.) (HG006911)National Institutes of Health (U.S.) (R01 HG004037
Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types
The regions bound by sequence-specific transcription factors can be highly variable across different cell types despite the static nature of the underlying genome sequence. This has been partly attributed to changes in chromatin accessibility, but a systematic picture has been hindered by the lack of large-scale data sets. Here, we use 456 binding experiments for 119 regulators and 84 chromatin maps generated by the ENCODE in six human cell types, and relate those to a global map of regulatory motif instances for these factors. We find specific and robust chromatin state preferences for each regulator beyond the previously reported open-chromatin association, suggesting a much richer chromatin landscape beyond simple accessibility. The preferentially bound chromatin states of regulators were enriched for sequence motifs of regulators relative to all states, suggesting that these preferences are at least partly encoded by the genomic sequence. Relative to all regions bound by a regulator, however, regulatory motifs were surprisingly depleted in the regulator's preferentially bound states, suggesting additional non-sequence-specific binding beyond the level predicted by the regulatory motifs. Such permissive binding was largely restricted to open-chromatin regions showing histone modification marks characteristic of active enhancer and promoter regions, whereas open-chromatin regions lacking such marks did not show permissive binding. Lastly, the vast majority of cobinding of regulator pairs is predicted by the chromatin state preferences of individual regulators. Overall, our results suggest a joint role of sequence motifs and specific chromatin states beyond mere accessibility in mediating regulator binding dynamics across different cell types.National Institutes of Health (U.S.) (Grant R01HG004037)National Institutes of Health (U.S.) (Grant RC1HG005334
Comprehensive analysis of the chromatin landscape in Drosophila melanogaster
Chromatin is composed of DNA and a variety of modified histones and non-histone proteins, which have an impact on cell differentiation, gene regulation and other key cellular processes. Here we present a genome-wide chromatin landscape for Drosophila melanogaster based on eighteen histone modifications, summarized by nine prevalent combinatorial patterns. Integrative analysis with other data (non-histone chromatin proteins, DNase I hypersensitivity, GRO-Seq reads produced by engaged polymerase, short/long RNA products) reveals discrete characteristics of chromosomes, genes, regulatory elements and other functional domains. We find that active genes display distinct chromatin signatures that are correlated with disparate gene lengths, exon patterns, regulatory functions and genomic contexts. We also demonstrate a diversity of signatures among Polycomb targets that include a subset with paused polymerase. This systematic profiling and integrative analysis of chromatin signatures provides insights into how genomic elements are regulated, and will serve as a resource for future experimental investigations of genome structure and function.United States. Dept. of Energy (Contract DE-AC02-05CH11231)RC2 HG005639U01 HG004279R01 GM082798R37 GM45744R37 GM45744R01 GM071923U54 HG004592National Science Foundation (U.S.) (NSF 0905968
Disruption of a Large Intergenic Noncoding RNA in Subjects with Neurodevelopmental Disabilities
Large intergenic noncoding (linc) RNAs represent a newly described class of ribonucleic acid whose importance in human disease remains undefined. We identified a severely developmentally delayed 16-year-old female with karyotype 46,XX,t(2;11)(p25.1;p15.1)dn in the absence of clinically significant copy number variants (CNVs). DNA capture followed by next-generation sequencing of the translocation breakpoints revealed disruption of a single noncoding gene on chromosome 2, LINC00299, whose RNA product is expressed in all tissues measured, but most abundantly in brain. Among a series of additional, unrelated subjects referred for clinical diagnostic testing who showed CNV affecting this locus, we identified four with exon-crossing deletions in association with neurodevelopmental abnormalities. No disruption of the LINC00299 coding sequence was seen in almost 14,000 control subjects. Together, these subjects with disruption of LINC00299 implicate this particular noncoding RNA in brain development and raise the possibility that, as a class, abnormalities of lincRNAs may play a significant role in human developmental disorders
Unified Modeling of Gene Duplication, Loss, and Coalescence Using a Locus Tree
Gene phylogenies provide a rich source of information about the way evolution shapes genomes, populations, and phenotypes. In addition to substitutions, evolutionary events such as gene duplication and loss (as well as horizontal transfer) play a major role in gene evolution, and many phylogenetic models have been developed in order to reconstruct and study these events. However, these models typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. While this assumption may have been reasonable in some settings, it has become increasingly problematic as increased genome sequencing has led to denser phylogenies, where effects such as ILS are more prominent. To address this challenge, we present a new probabilistic model, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed. Interestingly, this model implies that in addition to the usual gene tree and species tree, there exists a third tree, the locus tree, which will likely have many applications. Using this model, we develop the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS, and we show its improved inference of orthologs, paralogs, duplications, and losses for a variety of clades, including flies, fungi, and primates. Also, our simulations show that gene duplications increase the frequency of ILS, further illustrating the importance of a joint model. Going forward, we believe that this unified model can offer insights to questions in both phylogenetics and population genetics.National Science Foundation (U.S.) (Career award NSF 0644282
Evidence of Abundant Purifying Selection in Humans for Recently Acquired Regulatory Functions
Although only 5% of the human genome is conserved across mammals, a substantially larger portion is biochemically active, raising the question of whether the additional elements evolve neutrally or confer a lineage-specific fitness advantage. To address this question, we integrate human variation information from the 1000 Genomes Project and activity data from the ENCODE Project. A broad range of transcribed and regulatory nonconserved elements show decreased human diversity, suggesting lineage-specific purifying selection. Conversely, conserved elements lacking activity show increased human diversity, suggesting that some recently became nonfunctional. Regulatory elements under human constraint in nonconserved regions were found near color vision and nerve-growth genes, consistent with purifying selection for recently evolved functions. Our results suggest continued turnover in regulatory regions, with at least an additional 4% of the human genome subject to lineage-specific constraint.National Institutes of Health (U.S.) (Grant R01HG004037)National Institutes of Health (U.S.) (Grant RC1HG005334)National Science Foundation (U.S.) (CAREER Grant 0644282
ChromHMM: automating chromatin-state discovery and characterization
To the Editor:
Chromatin-state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type–specific activity patterns and for interpreting disease-association studies1, 2, 3, 4, 5. However, the computational challenge of learning chromatin-state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise. To address this challenge, we developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets and visualizing the resulting genome-wide maps of chromatin-state annotations.Massachusetts Institute of Technology. Computational and Systems Biology InitiativeNational Science Foundation (U.S.) (postdoctoral fellowship 0905968)National Institutes of Health (U.S.) (1-RC1- HG005334)National Institutes of Health (U.S.) (1 U54 HG004570
- …
