1,721,181 research outputs found

    Genomics of development and disease

    No full text
    The data will assist research in population genetic variation in health and disease genomics, cancer genomics and transcriptomics. They include data from the public domain which are frequently used by researchers in the research group, as well as data to be generated and analysis results during the progression of the project.    Aim is to enable integration of the data collection to NecTAR which further facilitates genomic and medical research in Australia. Part of the data collection may be shared among researchers nationally, conditioning on ethics clearance

    cnvHiTSeq: Integrative models for high-resolution copy number variation detection and genotyping using population sequencing data

    Full text link
    Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq

    Statistical approaches for copy number variation detection and association with complex human phenotypes

    Full text link
    Copy number variants (CNVs) play an important role in the disease pathogenesis, including epilepsy, diabetes and many others. CNVs, are also known to affect cellular phenotypes through several phenomenon such as gene dosage. Next generation technologies for sequencing (DNA and RNA) and metabolite profiling (metabolomics) has led to the systematic discovery and evaluation of various genomic variants and their relationship to multiple phenotypes. Such approaches often involve application of several statistical and machine learning methods for unravelling new relationships between genomic variants and phenotypes i.e. disease outcomes or quantitative traits characterized at the molecular level. This thesis explores and develops several statistical methods for CNV detection and association with complex human phenotypes, in particular for epilepsy drug-response, epilepsy susceptibility, metabolomics and gene expression. In more detail, chapter 3, describes a genome wide CNV association analysis for two phenotypes including epilepsy susceptibility and epilepsy drug response. I have identified several important candidate genes for these two phenotypes, including the top most associated genes, SLC9A1 (p-value=6.69E-15) for epilepsy susceptibility and WWOX (p-value=1.93E-3) for epilepsy drug response. These associations were replicated in a separate Australian cohort and were further validated in lab and in-silico, leading to some positive and negative confirmation. In chapter 4, I present CNV association with metabolomic data in the exonic regions of the TSPAN8 gene. A strong association signal was detected in the 6th exon and 7th exon of the TSPAN8 gene, where a large proportion of metabonomic lipid phenotypes were found to be associated with univariate (P-value=7.64E-4) and multivariate (P-value=1.33E-6) approaches. These CNVs were also found to be nominally associated with type 2 diabetes (P-value=3.32e-7). In addition, I also carried out advanced multivariate based association analysis to corroborate these results and further reported sequencing based validation results for TSPAN8 exonic CNVs in different human populations from the 1000 genomes project. In chapter 5, I report a genome wide CNV association analysis with gene expression in ten different regions of the human brain. I identified a novel CNV near the DRD5 gene which was found to be strongly associated with gene expression. Further, I have reported on-going efforts to replicate and validate this finding. Each of these different phenotype categories analysed posed its own unique challenges and required specific approaches for analysis and interpretation.Open Acces

    Statistical methods for elucidating copy number variation in high-throughput sequencing studies

    Full text link
    Copy number variation (CNV) is pervasive in the human genome and has been shown to contribute significantly to phenotypic diversity and disease aetiology. High-throughput sequencing (HTS) technologies have allowed for the systematic investigation of CNV at an unprecedented resolution. HTS studies offer multiple distinct features that can provide evidence for the presence of CNV. We have developed an integrative statistical framework that jointly analyses multiple sequencing features at the population level to achieve sensitive and precise discovery of CNV. First, we applied our framework to low-coverage whole-genome sequencing experiments and used data from the 1000 Genomes Project to demonstrate a substantial improvement in CNV detection accuracy over existing methods. Next, we extended our approach to targeted HTS experiments, which offer improved cost-efficiency by focusing on a predetermined subset of the genome. Targeted HTS involves an enrichment step that introduces non-uniformity in sequencing coverage across target regions and thus hinders CNV identification. To that end, we designed a customized normalization procedure that counteracts the effects of enrichment bias and enhances the underlying CNV signal. Our extended framework was benchmarked on contiguous capture datasets, where it was shown to outperform competing strategies by a wide margin. Capture sequencing can also generate large amounts of data in untargeted genomic regions. Although these off-target results can be a valuable source of CNV evidence, they are subject to complex enrichment patterns that confound their interpretation. Therefore, we developed the first normalization strategy that can adapt to the highly heterogeneous nature of off-target capture and thus facilitate CNV investigation in untargeted regions. All in all, we present a generalized CNV detection toolset that has been shown to achieve robust performance across datasets and sequencing platforms and can therefore provide valuable insight into the prevalence and impact of CNV.Open Acces

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Genetic and environmental correlates of growth patterns leading to obesity

    Full text link
    The intrauterine period is a vulnerable period of development. Any adverse environment can permanently change the body’s organ structure and function, expressed as an increased disease risk later in life. Studies show that variability in growth patterns in early life is associated with obesity and other cardiovascular diseases in adulthood, but the genetic and environmental determinants of these processes are largely unknown. The main objectives of this study were to identify genetic and environmental pre- and postnatal factors associated with early growth in infancy and childhood and later metabolic outcomes in adulthood from the Northern Finland Birth Cohorts (NFBCs). Several maternal and paternal factors, such as height, smoking, parity and pre-eclampsia, had direct association with faster postnatal height growth, some of which had their association mediated by size-at-birth variables. It was observed that an obesogenic environment in utero and during a child’s growth exerts a ‘programming’ effect on the glucose-insulin axis as well as other cardio-vascular risk factors in adolescence. Moreover, the study shows that Leukocyte Telomere Length (LTL) at 31 years, a marker for aging, is inversely associated with multiple measures of adiposity in both men and women, and that a BMI increase in women from childhood to adulthood is associated with shorter telomeres at age 31. Two new genetic variants in/near SBNO1 and HMGA2 genes are associated with infant head circumference, which may indicate influence of brain growth and neurodevelopment via early life. Variants in/near LEPR-LEPROT, FTO, TFAP2B and GNPDA2 showed an age-dependent association with adiposity in early childhood, while three loci (FTO, TFAP2B and GNPDA2) had their effect on adult adiposity mediated by early growth phenotypes. This study emphasises the clinical importance of early growth markers as they may inform public health policy aimed at improving the pre-pregnancy environment and to monitor childhood growth during the first few years of development.Open Acces

    Pathway and gene-based analysis of genome wide association studies (GWAS)

    Full text link
    My PhD thesis comprises the development and application of novel strategies to analyse genome-wide association studies (GWAS) in the context of functional pathways. I propose pathway and gene-centric methodologies as complementary tools to the conventional singlemarker analyses to mine further the GWAS hidden information. I developed the cumulative trend (CT) test statistic that assesses the cumulative genetic variation of single nucleotide polymorphisms (SNPs) of genes that interact in the same biological pathway and tests the association between a disease and the pathway as an entity. I applied this methodology to the genotypic data of the Wellcome Trust Case Control Consortium (WTCCC) study on Crohn’s disease (CD), type I diabetes (T1D), rheumatoid arthritis (RA), bipolar disorder, hypertension, type II diabetes, coronary artery disease; I identified highly significant associations between the autoimmune diseases (CD, T1D, RA) and inflammatory pathways; almost no association was identified between the same pathways and the non-inflammatory conditions. I extended my approach to a pathway-based gene stability selection methodology, which selects associated genes in the context of associated pathways. This methodology can be used to prioritise genes for follow up studies. I applied it on two GWAS of RA with different ethnic background and typed on different platforms and I demonstrated replication at the pathway, gene and in-silico functional levels. I finally extended my approach on family trios designed GWAS. I applied it on two casecontrol and family trio datasets of Kawasaki disease (KD). I explored the association between the TGF-β pathway and KD susceptibility. The involvement of this pathway in KD was further validated at the gene expression and protein levels. My proposed methodologies were tested on real datasets and provided reproducible results, which indicates rigor and robustness. I would therefore suggest their application to single or multiple GWAS as a complement to conventional single-SNP analysis
    corecore