1,721,041 research outputs found
Mixtures of Probit Regression Models with Overlapping Clusters
Studies with binary outcomes on a heterogeneous population are quite common. Typically, the heterogeneity is modelled through varying effect coefficients within some binary regression setting combined with a clustering procedure. Most of the existing methods assign statistical units to distinct and nonoverlapping clusters. However, there are scenarios where units exhibit a more complex organization and the clusters can be thought as partially overlapping. In this case, the standard approach does not work. In this paper, we define a mixture of regression models that allows overlapping clusters. This approach involves an overlap function that maps the regression coefficients, either at the unit or response level, of the parent clusters into the coefficients of the multiple allocation clusters. In order to deal with this intrinsic heterogeneity, regression analyses have to be stratified for different groups of observations or clusters. We present a computationally efficient Monte Carlo Markov Chain (MCMC) scheme for the case of a mixture of probit regressions. A simulation study shows the overall performance of the method. We conclude with two illustrative examples of modelling voting behavior, involving United States (US) Supreme Court justices over a number of topics and members of the United Kingdom (UK) parliament over divisions related to Brexit. These applications provide insights on the usefulness of the method in real applications. The method described can be extended to the case of a generic mixture of multivariate generalized linear models under overlapping clusters
CLUSTERING TWO-MODE BINARY NETWORK DATA WITH OVERLAPPING MIXTURE MODEL AND COVARIATES INFORMATION
NEAT: An efficient network enrichment analysis test
Background: Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. Results: We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. Conclusions: NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat )
cglasso: An R Package for Conditional Graphical Lasso Inference with Censored and Missing Values
Sparse graphical models have revolutionized multivariate inference. With the advent of high-dimensional multivariate data in many applied fields, these methods are able to detect a much lower-dimensional structure, often represented via a sparse conditional independence graph. There have been numerous extensions of such methods in the past decade. Many practical applications have additional covariates or suffer from missing or censored data. Despite the development of these extensions of sparse inference methods for graphical models, there have been so far no implementations for, e.g., conditional graphical models. Here we present the general-purpose package cglasso for estimating sparse conditional Gaussian graphical models with potentially missing or censored data. The method employs an efficient expectation-maximization estimation of an ℓ1 -penalized likelihood via a block-coordinate descent algorithm. The package has a user-friendly data manipulation interface. It estimates a solution path and includes various automatic selection algorithms for the two ℓ1 tuning parameters, associated with the sparse precision matrix and sparse regression coefficients, respectively. The package pays particular attention to the visualization of the results, both by means of marginal tables and figures, and of the inferred conditional independence graphs. This package provides a unique and computational efficient implementation of a conditional Gaussian graphical model that is able to deal with the additional complications of missing and censored data. As such it constitutes an important contribution for empirical scientists wishing to detect sparse structures in high-dimensional data
Using Differential Geometry for Sparse High-Dimensional Risk Regression Models
With the introduction of high-throughput technologies in clinical and epidemiological studies, the need for inferential tools that are able to deal with fat data-structures, i.e., relatively small number of observations compared to the number of features, is becoming more prominent. In this paper we propose an extension of the dgLARS method to high-dimensional risk regression models. The main idea of the proposed method is to use the differential geometric structure of the partial likelihood function in order to select the optimal subset of covariates
Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data
Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next-generation sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP-Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a 'zero-inflation group' in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent conditional autoregressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP-seq real data
Penalized inference of the hematopoietic cell differentiation network via high-dimensional clonal tracking
Background: During their lifespan, stem- or progenitor cells have the ability to differentiate into more committed cell lineages. Understanding this process can be key in treating certain diseases. However, up until now only limited information about the cell differentiation process is known. Aim: The goal of this paper is to present a statistical framework able to describe the cell differentiation process at the single clone level and to provide a corresponding inferential procedure for parameters estimation and structure reconstruction of the differentiation network. Approach: We propose a multidimensional, continuous-time Markov model with density-dependent transition probabilities linear in sub-population sizes and rates. The inferential procedure is based on an iterative calculation of approximated solutions for two systems of ordinary differential equations, describing process moments evolution over time, that are analytically derived from the process’ master equation. Network sparsity is induced by adding a SCAD-based penalization term in the generalized least squares objective function. Results: The methods proposed here have been tested by means of a simulation study and then applied to a data set derived from a gene therapy clinical trial, in order to investigate hematopoiesis in humans, in-vivo. The hematopoietic structure estimated contradicts the classical dichotomy theory of cell differentiation and supports a novel myeloid-based model recently proposed in the literature.</p
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
