1,721,208 research outputs found

    Supplemental Materials for the manuscript entitled "Accumulation in visceral adipose tissue over six years is associated with lower paraspinal muscle density"

    No full text
    Supplemental Materials for the manuscript entitled "Accumulation in visceral adipose tissue over six years is associated with lower paraspinal muscle density

    Dimension reduction methods for nonlinear association analysis with applications to omics data

    Full text link
    With advances in high-throughput techniques, the availability of large-scale omics data has revolutionized the fields of medicine and biology, and has offered a better understanding of the underlying biological mechanisms. However, the high-dimensionality and the unknown association structure between different data types make statistical integration analyses challenging. In this dissertation, we develop three dimensionality reduction methods to detect nonlinear association structure using omics data. First, we propose a method for variable selection in a nonparametric additive quantile regression framework. We enforce a network regularization to incorporate information encoded by known networks. To account for nonlinear associations, we approximate the additive functional effect of each predictor with the expansion of a B-spline basis. We implement the group Lasso penalty to achieve sparsity. We define the network-constrained penalty by regulating the difference between the effect functions of any two linked genes (predictors) in the network. Simulation studies show that our proposed method performs well in identifying truly associated genes with fewer falsely associated genes than alternative approaches. Second, we develop a canonical correlation analysis (CCA)-based method, canonical distance correlation analysis (CDCA), and leverage the distance correlation to capture the overall association between two sets of variables. The CDCA allows untangling linear and nonlinear dependence structures. Third, we develop the sparse CDCA (sCDCA) method to achieve sparsity and improve result interpretability by adding penalties on the loadings from the CDCA. The sCDCA method can be applied to data with large dimensionality and small sample size. We develop iterative majorization-minimization-based coordinate descent algorithms to compute the loadings in the CDCA and sCDCA methods. Simulation studies show that the proposed CDCA and sCDCA approaches have better performance than classical CCA and sparse CCA (sCCA) in nonlinear settings and have similar performance in linear association settings. We apply the proposed methods to the Framingham Heart Study (FHS) to identify body mass index associated genes, the association structure between metabolic disorders and metabolite profiles, and a subset of metabolites and their associated type 2 diabetes (T2D)-related genes.2023-11-05T00:00:00

    Using functional annotation to characterize genome-wide association results

    Full text link
    Genome-wide association studies (GWAS) have successfully identified thousands of variants robustly associated with hundreds of complex traits, but the biological mechanisms driving these results remain elusive. Functional annotation, describing the roles of known genes and regulatory elements, provides additional information about associated variants. This dissertation explores the potential of these annotations to explain the biology behind observed GWAS results. The first project develops a random-effects approach to genetic fine mapping of trait-associated loci. Functional annotation and estimates of the enrichment of genetic effects in each annotation category are integrated with linkage disequilibrium (LD) within each locus and GWAS summary statistics to prioritize variants with plausible functionality. Applications of this method to simulated and real data show good performance in a wider range of scenarios relative to previous approaches. The second project focuses on the estimation of enrichment by annotation categories. I derive the distribution of GWAS summary statistics as a function of annotations and LD structure and perform maximum likelihood estimation of enrichment coefficients in two simulated scenarios. The resulting estimates are less variable than previous methods, but the asymptotic theory of standard errors is often not applicable due to non-convexity of the likelihood function. In the third project, I investigate the problem of selecting an optimal set of tissue-specific annotations with greatest relevance to a trait of interest. I consider three selection criteria defined in terms of the mutual information between functional annotations and GWAS summary statistics. These algorithms correctly identify enriched categories in simulated data, but in the application to a GWAS of BMI the penalty for redundant features outweighs the modest relationships with the outcome yielding null selected feature sets, due to the weaker overall association and high similarity between tissue-specific regulatory features. All three projects require little in the way of prior hypotheses regarding the mechanism of genetic effects. These data-driven approaches have the potential to illuminate unanticipated biological relationships, but are also limited by the high dimensionality of the data relative to the moderate strength of the signals under investigation. These approaches advance the set of tools available to researchers to draw biological insights from GWAS results

    Statistical methods for genetic association studies: detecting gene x environment interaction in rare variant analysis

    Full text link
    Investigators have discovered thousands of genetic variants associated with various traits using genome-wide association studies (GWAS). These discoveries have substantially improved our understanding of the genetic architecture of many complex traits. Despite the striking success, these trait-associated loci collectively explain relatively little of disease risk. Many reasons for this unexplained heritability have been suggested and two understudied components are hypothesized to have an impact in complex disease etiology: rare variants and gene-environment (GE) interactions. Advances in next generation sequencing have offered the opportunity to comprehensively investigate the genetic contribution of rare variants on complex traits. Such diseases are multifactorial, suggesting an interplay of both genetics and environmental factors, but most GWAS have focused on the main effects of genetic variants and disregarded GE interactions. In this dissertation, we develop statistical methods to detect GE interactions for rare variant analysis for various types of outcomes in both independent and related samples. We leverage the joint information across a set of rare variants and implement variance component score tests to reduce the computational burden. First, we develop a GE interaction test for rare variants for binary and continuous traits in related individuals, which avoids having to restrict to unrelated individuals and thereby retaining more samples. Next, we propose a method to test GE interactions in rare variants for time-to-event outcomes. Rare variant tests for survival outcomes have been underdeveloped, despite their importance in medical studies. We use a shrinkage method to impose a ridge penalty on the genetic main effects to deal with potential multicollinearity. Finally, we compare different types of penalties, such as least absolute shrinkage selection operator and elastic net regularization, to examine the performance of our second method under various simulation scenarios. We illustrate applications of the proposed methods to detect gene x smoking interaction influencing body mass index and time-to-fracture in the Framingham Heart Study. Our proposed methods can be readily applied to a wide range of phenotypes and various genetic epidemiologic studies, thereby providing insight into biological mechanisms of complex diseases, identifying high-penetrance subgroups, and eventually leading to the development of better diagnostics and therapeutic interventions

    Novel statistical methods to improve precision medicine

    No full text
    2024Precision medicine, also known as personalized medicine, refers to the tailoring of therapeutic or preventive interventions to specific subpopulations of patients based on the patients’ characteristics. Accurate disease subtyping could be essential for precision medicine, which aims to provide individualized treatments to patients. The development of precision medicine thus relies on a sufficient understanding of the underlying mechanisms of diseases. Recent technological advances, especially in genomics and molecular biology, have provided unprecedented opportunities to gain greater insight into disease subtypes and underlying mechanisms. However, translating this deep wealth of knowledge into clinical practice for precision medicine remains a challenging task.This dissertation intends to improve precision medicine from two statistical perspectives. The first is to identify disease subtypes and related biomarkers through clustering using multi-omics data, which could be the first step toward precision medicine. The second is to accurately stratify patients into subtypes through companion diagnostic devices (CDx) in clinical trials, which is directly related to developing targeted precision medicine therapy. We propose two novel convex clustering methods that allow the incorporation of prior information or knowledge and generate stable cluster results. One is information-incorporated Sparse Convex Clustering (iSCC), utilizing a text mining approach to retrieve existing information from previously published studies on available sources, such as PubMed, to identify disease-related biomarkers and improve disease subtyping. The other one is Prior Knowledge-assisted Integrative Convex Clustering (PK-ICC), incorporating prior biological knowledge on grouping information between features, such as biological pathways and the gene regulatory mechanism, through a group lasso penalty to improve disease subtyping and select relevant groups of features simultaneously. Both simulations and real data analysis have demonstrated that our proposed methods can identify more accurate disease subtypes and biologically meaningful biomarkers. We also propose a finite mixture model framework to quantify the impact of CDx measurement performance on clinical trials with binary or time-to-event outcomes, which helps future design of trials when using CDx. Overall, this dissertation has proposed statistical methods that may improve the identification of disease subtypes and the design of CDx incorporated trials, which may lead to better clinical outcomes through precision medicine.2027-02-12T00:00:00

    Graphical models for directed acyclic graphs

    Full text link
    Graphical models are a family of models commonly used to represent the conditional independence structure among the variables of interest. Directed acyclic graphs (DAGs) provide a representation of the causal relationships and can be helpful for research in Epidemiology and other public health areas. When modeling causal relationships, issues such as effect measure modification and potential unmeasured confounders need to be considered. Recent advances in biomedical research and technology have made more data available, such as multi-omics data, biomarker profiles as well as biological pathway information. Therefore, we developed three graphical models for DAGs to better leverage these versatile data while accounting for effect measure modification and potential unmeasured confounders. First, we generalized a Bayesian graphical regression by Ni et al. (2018). We used a Gaussian copula to connect a latent variable with the multiple types of observed data. The proposed method allows for multiple data types while estimating the graph structure that depends on potential effect measure modification. Simulation studies showed that this proposed method outperforms the method by Ni et al. (2018) when there are multiple data types. Second, we extended the structural factor equation model by Zhou et al. (2021) and proposed an information-aided graphical model. The proposed method can incorporate the group information via the group Lasso penalty while accounting for the potential unmeasured confounders. Simulations demonstrated that the proposed method performs better than the original method that does not incorporate group information. Third, we additionally imposed the within-group sparsity constraint on our second method, yielding both the sparsity of groups and within-group variables while incorporating the group information. The proposed method is shown to be robust against the proportion of variables without effect in a group. We illustrated our proposed methods with data from the Framingham Heart Study to explore the relationships between metabolic syndromes, important inflammation biomarkers, and individual demographic characteristics. We also explored the gene regulatory networks of genes that are related to inflammation and adipose tissue. The findings may offer helpful insights into the mechanisms of metabolic syndrome and patient-specific health management strategies.2025-01-23T00:00:00

    Mendelian randomization with longitudinal data using functional data analysis approaches

    No full text
    In the past few decades, causal relationship evaluation has become more of an interest to help understand the underlying disease mechanism. Mendelian randomization (MR) is a useful approach that uses genetic variants as instrumental variables to investigate causal relationships between exposures and complex traits that can potentially overcome confounding in epidemiological studies. However, the conventional MR method only utilizes cross-sectional data. Because data in observational studies are often collected repeatedly over time, not incorporating such longitudinal data from repeated measurements into the analysis will lose a lot of information. Meanwhile, the time-varying effect of the exposure or covariates will be neglected if we only treat them as time-constant. Functional data analysis is a growing field that can treat data as functions. When it comes to the longitudinal setting, those repeated measurements can be considered as functions of time. In this dissertation, we develop methods to leverage longitudinal information from repeatedly measured variables to evaluate the causal relationship between the exposure and the outcome using functional data analysis related approaches. First, we propose multivariable functional MR models that utilize functional principal component analysis (FPCA) to handle multiple time-varying exposures under a multivariable MR framework. We also introduce the concept of mean functional exposure, yielding interpretable causal effect estimates. Our simulation study demonstrates that the proposed models perform better than alternative methods utilizing only a single measurement, in terms of both statistical power and bias of the effect estimate. Second, we develop methods that incorporate FPCA and functional regression to deal with time-varying exposure and time-varying covariates simultaneously in an MR model. Specifically, we implement FPCA based method on continuous time-varying variables and sparse logistic functional principal component analysis on binary time-varying variables. Through simulation studies, we show that our proposed models outperform the models that treat exposure and/or covariates as static measurements in terms of both power and mean squared error. Finally, because the outcome sometimes will be a disease of interest (usually a binary variable), we further make the proposed multivariable functional MR models adaptable to a binary outcome by integrating the multivariable functional MR framework with the two-stage residual inclusion method. We illustrate the application of our proposed models with data from the Framingham Heart Study Offspring cohort to study the causal relationship between obesity indices and various bone health related measures or fractures. Our proposed methods advance the research of causal inference by making better use of longitudinal information, and thus can provide more insights into the relationship between exposures and the outcome of interest.2026-09-17T00:00:00
    corecore