1,720,961 research outputs found
Inversion polymorphism in a complete human genome assembly
Abstract The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1–23.1, and 22q11.21
Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies.
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment
Impact and characterization of serial structural variations across humans and great apes.
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease
The Effect of Structural Variation on Gene Expression
Short-read RNA sequencing captures only fragments of gene transcripts, requiring computational reconstruction and limiting transcript diversity knowledge. In contrast, long-read sequencing provides full-length RNA transcripts without reconstruction. Dr. Mahmoud et al. identified 389 medically relevant genes, selecting glucose-6-phosphate isomerase (GPI) on chromosome 19 and its neighboring genes (GARRE1, PDCD2L, UBA2) due to their links to genetic disorders. Analyzing 130 haplotypes from 65 diverse individuals in the Human Genome Structural Variation Consortium (HGSVC), repeat masking was performed using RepeatMasker with the Dfam library. Gene and exon locations were determined using Ensembl (release 113). A custom exon library was created, retaining only exon hits with less than 5% divergence. For each haplotype, the intronic region between exons 9 and 10 of GPI was identified and analyzed using k-mers (15-79 bases). The 64 unique k-mers were aligned with MUSCLE, producing a 50-base consensus sequence dubbed the dark region repeat consensus sequence. This sequence was analyzed with nhmmer, retaining hits with e-values below 0.01. Multiple sequence alignment using Clustal Omega revealed eight network-based component consensus sequences (NCCs), used to reannotate the region, yielding 13 unique GPI dark region haplotypes. A key structural variation found was a deletion in one individual\u27s haplotype, leading to the loss of a novel isoform. This study highlights long-read sequencing\u27s ability to uncover previously unknown transcript variations
Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement.
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features
An integrative TAD catalog in lymphoblastoid cell lines discloses the functional impact of deletions and insertions in human genomes.
The human genome is packaged within a three-dimensional (3D) nucleus and organized into structural units known as com- partments, topologically associating domains (TADs), and loops. TAD boundaries, separating adjacent TADs, have been found to be well conserved across mammalian species and more evolutionarily constrained than TADs themselves. Recent studies show that structural variants (SVs) can modify 3D genomes through the disruption of TADs, which play an essential role in insulating genes from outside regulatory elements’ aberrant regulation. However, how SV affects the 3D genome structure and their association among different aspects of gene regulation and candidate cis-regulatory elements (cCREs) have rarely been studied systematically. Here, we assess the impact of SVs intersecting with TAD boundaries by developing an integrative Hi-C analysis pipeline, which enables the generation of an in-depth catalog of TADs and TAD boundaries in human lymphoblastoid cell lines (LCLs) to fill the gap of limited resources. Our catalog contains 18,865 TADs, including 4596 sub-TADs, with 185 SVs (TAD–SVs) that alter chromatin architecture. By leveraging the ENCODE registry of cCREs in humans, we determine that 34 of 185 TAD–SVs intersect with cCREs and observe significant enrichment of TAD–SVs within cCREs. This study provides a database of TADs and TAD–SVs in the human genome that will facilitate future investigations of the impact of SVs on chromatin structure and gene regulation in health and disease
A global map for introgressed structural variation and selection in humans
Genetic introgression from Neanderthals and Denisovan has shaped modern human genomes; however, introgressed structural variants (SVs ≥50 base pairs) remain challenging to discover. We integrated highquality phased assemblies from four new Papua New Guinea (PNG) genomes with 94 published assemblies of diverse ancestry to infer an archaic introgressed SV map. Introgressed SVs are overall enriched in genes (44%, n=1,592), including critical genomic disorder regions, and most abundant in PNG. We identify 11 centromeres likely derived from archaic hominins, adding unexplored diversity to centromere genomics. Pangenome genotyping across 1,363 samples reveals 16 candidate adaptive SVs, many associated with immune-related genes and their expression, in the PNG. We hypothesize that archaic SV introgression contributed to reproductive success, underscoring introgression as a significant force in human adaptive evolution. INTRODUCTIONEvidence from over a decade of research unequivocally supports interbreeding occurred between archaic hominins, such as Neanderthals(1-3) and Denisovan(4), and the ancestors of modern humans, likely through multiple points of contact over the past 100,000 years of human evolution (5, 6). Genomic studies have largely focused on using single-nucleotide variant (SNV) from diverse populations to establish patterns of archaic introgression in our genome. Modern humans in Eurasia today derive 2-5% (or 120-300 million base pairs [Mbp] per diploid genome) of their ancestry from archaic hominins, with the highest levels observed in Papua New Guinea (PNG)(5, 7). Despite the evidence of selection against deleterious archaic alleles in the human genome, some archaic sequences likely contributed to human phenotypic variation(8, 9). Consistent with these functional implications, many introgressed loci in our genome show signatures of positive selection. Some of those loci encompass candidate genes that are known to be functionally associated with differential gene expression, altitude, immunity, metabolism, and disease, highlighting contributions of introgressed alleles to the evolution of our species(10-16). Notwithstanding these discoveries, our understanding of the contribution of archaic introgression to human genomic variation and evolution remains far from complete due to the incomplete characterization of all classes of genetic variation. Structural variants (SVs), especially in complex loci, including segmental duplications (SDs) and centromeres(17-19), have been challenging to assess because of the highly fragmented nature of ancient DNA (~50 bp) and the near impossibility of systematically discovering SVs from such ancient DNA, especially in gene-rich regions associated with repeats(14).SVs, such as insertions, deletions, and inversions, contribute disproportionately to human genetic diversity by affecting more genomic sequences than SNVs and can significantly disrupt genes and regulatory elements, leading to relatively larger effects on gene expression and phenotype as defined by association studies (17, 20). Indeed, the effect size for this particular class has been estimated to be more than order of magnitude greater than single-nucleotide polymorphisms. In humans, many SVs have been strongly implicated in a variety of diseases, such as neurodevelopmental disorders (e.g., 22p11.2 deletion syndrome, 22q11.2DS)(20) and coronary heart diseases (LPA)(21). Conversely, human-specific SVs have been shown to play important roles in the adaptive evolution of our species, including the adaptations to diet( 22) and the expansion of human neocortex(23). Furthermore, several recent studies provided some of the first evidence for adaptive SV introgression (12, 14, 16) and reported novel protein-coding genes with positively selected sites within introgressed regions(14). Notably, these studies relied primarily on shortread sequencing data, with limited long-read data available, inadequately capturing the full spectrum of variations, particularly in complex loci, due to the intricate and repetitive nature of many SVs(17, 19, 21, 24). Therefore, advancements in data and inference approaches are still required for comprehensive analysis of SV evolution.Highly accurate long-read sequencing technologies have now made it possible to completely resolve complex repetitive regions in the human genome for the first time, including most SVs (17, 19, 21, 24, 25). Recent long-read sequencing efforts from the Human Pangenome Reference Consortium (HPRC)(21) and the Human Genome Structural Variation Consortium (HGSVC)(17) reveal that over 70% of SVs are inaccessible to short-read sequencing, with many novel SVs located in genes or regions</div
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
