Collection Of Biostatistics Research Archive
Not a member yet
1589 research outputs found
Sort by
Variance Estimation in Inverse Probability Weighted Cox Models
Inverse probability weighted Cox models can be used to estimate marginal hazard ratios under different treatments interventions in observational studies. To obtain variance estimates, the robust sandwich variance estimator is often recommended to account for the induced correlation among weighted observations. However, this estimator does not incorporate the uncertainty in estimating the weights and tends to overestimate the variance, leading to inefficient inference. Here we propose a new variance estimator that combines the estimation procedures for the hazard ratio and weights using stacked estimating equations, with additional adjustments for the sum of non-independent and identically distributed terms in a Cox partial likelihood score equation. We prove analytically that the robust sandwich variance estimator is conservative and establish the asymptotic equivalence between the proposed variance estimator and one obtained through linearization by Hajage et al., 2018. In addition, we extend our proposed variance estimator to accommodate clustered data. We compare the finite sample performance of the proposed method with alternative methods through simulation studies. We illustrate these different variance methods in an inverse probability weighted application to estimate the marginal hazard ratio for postoperative hospitalization under sleeve gastrectomy versus Roux-en-Y gastric bypass in a large medical claims and billing database. To facilitate implementation of the proposed method, we have developed an R package ipwCoxCSV
Potential health risks linked to emerging contaminants in major rivers and treated waters
The presence of endocrine-disrupting chemicals (EDCs) in our local waterways is becoming an increasing threat to the surrounding population. These compounds and their degradation products (found in pesticides, herbicides, and plastic waste) are known to interfere with a range of biological functions from reproduction to differentiation. To better understand these effects, we used an in silico ontological pathway analysis to identify the genes affected by the most commonly detected EDCs in large river water supplies, which we grouped together based on four common functions: Organismal injuries, cell death, cancer, and behavior. In addition to EDCs, we included the opioid buprenorphine in our study, as this similar ecological threat has become increasingly detected in river water supplies. Through the identification of the pleiotropic biological effects associated with both the acute and chronic exposure to EDCs and opioids in local water supplies, our results highlight a serious health threat worthy of additional investigations with a potential emphasis on the effects linked to increased DNA damage
General approach of causal mediation analysis with causally ordered multiple mediators and survival outcome
Causal mediation analysis with multiple mediators (causal multi-mediation analysis) is critical in understanding why an intervention works, especially in medical research. Deriving the path-specific effects (PSEs) of exposure on the outcome through a certain set of mediators can detail the causal mechanism of interest. However, the existing models of causal multi-mediation analysis are usually restricted to partial decomposition, which can only evaluate the cumulative effect of several paths. Moreover, the general form of PSEs for an arbitrary number of mediators has not been proposed. In this study, we provide a generalized definition of PSE for partial decomposition (partPSE) and for complete decomposition, which are extended to the survival outcome. We apply the interventional analogues of PSE (iPSE) for complete decomposition to address the difficulty of non-identifiability. Based on Aalen’s additive hazards model and Cox’s proportional hazards model, we derive the generalized analytic forms and illustrate asymptotic property for both iPSEs and partPSEs for survival outcome. The simulation is conducted to evaluate the performance of estimation in several scenarios. We apply the new methodology to investigate the mechanism of methylation signals on mortality mediated through the expression of three nested genes among lung cancer patients
Generalized interventional approach for causal mediation analysis with causally ordered multiple mediators
Causal mediation analysis has demonstrated the advantage of mechanism investigation. In conditions with causally ordered mediators, path-specific effects (PSEs) are introduced for specifying the effect subject to a certain combination of mediators. However, most PSEs are unidentifiable. To address this, an alternative approach termed interventional analogue of PSE (iPSE), is widely applied to effect decomposition. Previous studies that have considered multiple mediators have mainly focused on two-mediator cases due to the complexity of the mediation formula. This study proposes a generalized interventional approach for the settings, with the arbitrary number of ordered multiple mediators to study the causal parameter identification as well as statistical estimation. It provides a general definition of iPSEs with a recursive formula, assumptions for nonparametric identification, a regression-based method, and a g-computation algorithm to estimate all iPSEs. We demonstrate that each iPSE reduces to the result of linear structural equation modeling subject to linear or log-linear models. This approach is applied to a Taiwanese cohort study for exploring the mechanism by which hepatitis C virus infection affects mortality through hepatitis B virus infection, liver function, and hepatocellular carcinoma. Software based on a g-computation algorithm allows users to easily apply this method for data analysis subject to various model choices according to the substantive knowledge for each variable. All methods and software proposed in this study contribute to comprehensively decompose a causal effect confirmed by data science and help disentangling causal mechanisms when the natural pathways are complicated
Model-Robust Inference for Clinical Trials that Improve Precision by Stratified Randomization and Adjustment for Additional Baseline Variables
We focus on estimating the average treatment effect in clinical trials
that involve stratified randomization, which is commonly used. It is
important to understand the large sample properties of estimators that
adjust for stratum variables (those used in the randomization
procedure) and additional baseline variables, since this can lead to
substantial gains in precision and power. Surprisingly, to the best
of our knowledge, this is an open problem. It was only recently that a
simpler problem was solved by Bugni et al. (2018) for the case with no
additional baseline variables, continuous outcomes, the analysis of
covariance (ANCOVA) estimator, and no missing data. We generalize
their results in three directions. First, in addition to continuous
outcomes, we handle binary and time-to-event outcomes; this broadens
the applicability of the results. Second, we allow adjustment for an
additional, preplanned set of baseline variables, which can improve
precision. Third, we handle missing outcomes under the missing at
random assumption. We prove that a wide class of estimators is
asymptotically normally distributed under stratified randomization and
has equal or smaller asymptotic variance than under simple
randomization. For each estimator in this class, we give a consistent
variance estimator. This is important in order to fully capitalize on
the combined precision gains from stratified randomization and
adjustment for additional baseline variables. The above results also
hold for the biased-coin covariate-adaptive design. We demonstrate our
results using completed trial data sets of treatments for substance
use disorder, where adjustment for additional baseline variables
brings substantial variance reduction
Supervised Dimension Reduction for Large-scale Omics Data with Censored Survival Outcomes Under Possible Non-proportional Hazards
The past two decades have witnessed significant advances in high-throughput ``omics technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive ``omics profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data
Components of the ribosome biogenesis pathway underlie establishment of telomere length set point in Arabidopsis
Telomeres cap the physical ends of eukaryotic chromosomes to ensure complete DNA replication and genome stability. Heritable natural variation in telomere length exists in yeast, mice, plants and humans at birth; however, major effect loci underlying such polymorphism remain elusive. Here, we employ quantitative trait locus (QTL) mapping and transgenic manipulations to identify genes controlling telomere length set point in a multi-parent Arabidopsis thaliana mapping population. We detect several QTL explaining 63.7% of the total telomere length variation in the Arabidopsis MAGIC population. Loss-of-function mutants of the NOP2A candidate gene located inside the largest effect QTL and of two other ribosomal genes RPL5A and RPL5B establish a shorter telomere length set point than wild type. These findings indicate that evolutionarily conserved components of ribosome biogenesis and cell proliferation pathways promote telomere elongation
Trabecular bone fraction variation in modern humans, fossil hominins and other primates
Evidence suggests that recent modern humans (Holocene) have low trabecular bone density (i.e., trabecular bone fraction, TBF) compared with other extant primates and fossil hominins. However, the extent to which TBF in recent humans with varying subsistence strategies differs from that of fossil hominins, and in turn, how hominins differ from various extant catarrhines is unclear. This study tests the hypotheses that first, populations with subsistence strategies demanding high physical activity exhibit greater TBF than sedentary populations and are more similar to fossil Homo. Secondly, that, australopiths have TBF that is more similar to nonhuman primates because of the greater mechanical loading on their skeletons. The study quantifies TBF in the limb epiphyses of recent humans, hominoids, cercopithecines, and fossil hominins. The results show overall a significant decrease in TBF among recent humans, whereas hominins, hominoids, and cercopithecines have similar, high TBF values. In addition, active human populations display TBF that is more similar to fossil Homo. The results suggest that this TBF decline reflects a reduction in activity levels among sedentary populations, although a systemic decline cannot be ruled out. These findings support the recent evolution of low trabecular density because of a decline in activity levels and underscore the utility of comparing multiple skeletal elements across a diverse set of recent modern humans when drawing conclusions about changes in trabecular bone in the human skeleton
Using molecular diet analysis to inform invasive species management: A case study of introduced rats consuming endemic New Zealand frogs
The decline of amphibians has been of international concern for more than two decades, and the global spread of introduced fauna is a major factor in this decline. Conservation management decisions to implement control of introduced fauna are often based on diet studies. One of the most common metrics to report in diet studies is Frequency of Occurrence (FO), but this can be difficult to interpret, as it does not include a temporal perspective. Here, we examine the potential for FO data derived from molecular diet analysis to inform invasive species management, using invasive ship rats (Rattus rattus) and endemic frogs (Leiopelma spp.) in New Zealand as a case study. Only two endemic frog species persist on the mainland. One of these, Leiopelma archeyi, is Critically Endangered (IUCN 2017) and ranked as the world\u27s most evolutionarily distinct and globally endangered amphibian (EDGE, 2018). Ship rat stomach contents were collected by kill-trapping and subjected to three methods of diet analysis (one morphological and two DNA-based). A new primer pair was developed targeting all anuran species that exhibits good coverage, high taxonomic resolution, and reasonable specificity. Incorporating a temporal parameter allowed us to calculate the minimum number of ingestion events per rat per night, providing a more intuitive metric than the more commonly reported FO. We are not aware of other DNA-based diet studies that have incorporated a temporal parameter into FO data. The usefulness of such a metric will depend on the study system, in particular the feeding ecology of the predator. Ship rats are consuming both species of native frogs present on mainland New Zealand, and this study provides the first detections of remains of these species in mammalian stomach contents
Generalized Matrix Decomposition Regression: Estimation and Inference for Two-way Structured Data
Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. In this article, we first propose the GMD regression (GMDR) as an estimation/prediction tool that seamlessly incorporates two-way structures into high-dimensional linear models. The proposed GMDR directly regresses the outcome on a set of GMD components, selected by a novel procedure that guarantees the best prediction performance. We then propose the GMD inference (GMDI) framework to identify variables that are associated with the outcome for any model in a large family of regression models that includes GMDR. As opposed to most existing tools for high-dimensional inference, GMDI efficiently accounts for pre-specified two-way structures and can provide asymptotically valid inference even for non-sparse coefficient vectors. We study the theoretical properties of GMDI in terms of both the type-I error rate and power. We demonstrate the effectiveness of GMDR and GMDI on simulated data and an application to microbiome data