1,721,030 research outputs found

    Graphical Models for Categorical Data

    No full text
    This book arises out of a short course given in a Séminaires Européens de Statistiques (SemStat) meeting at the European Institute for Statistics, Probability, Stochastic Operations Research and their Applications (EURANDOM) in Eindhoven, The Netherlands, over March 7–10, 2017. This SemStat meeting was organized as a part of the COST Action “European Cooperation for Statistics of Network Data Science” (COSTNET, CA15109) with the aim of introducing early career researchers to the field of statistical network science. In this perspective, the material presented here concerns the theory of graphical models and includes well-established methodology from the early developments in this field, but also the theory of models introduced more recently in the graphical model literature. The focus is on the discrete case where all the variables involved in the analysis are categorical and, in this context, classical and more recent results are presented in a unified way

    A Unified Approach to the Characterisation of Equivalence Classes of DAGs, Chain Graphs with no Flags and Chain Graphs

    No full text
    A Markov property associates a set of conditional independencies to a graph. Two alternative Markov properties are available for chain graphs (CGs), the Lauritzen–Wermuth– Frydenberg (LWF) and the Andersson–Madigan– Perlman (AMP) Markov properties, which are different in general but coincide for the subclass of CGs with no flags. Markov equivalence induces a partition of the class of CGs into equivalence classes and every equivalence class contains a, possibly empty, subclass of CGs with no flags itself containing a, possibly empty, subclass of directed acyclic graphs (DAGs). LWF-Markov equivalence classes of CGs can be naturally characterized by means of the so-called largest CGs, whereas a graphical characterization of equivalence classes of DAGs is provided by the essential graphs. In this paper, we show the existence of largest CGs with no flags that provide a natural characterization of equivalence classes of CGs of this kind, with respect to both the LWF- and the AMP-Markov properties. We propose a procedure for the construction of the largest CGs, the largest CGs with no flags and the essential graphs, thereby providing a unified approach to the problem. As by-products we obtain a characterization of graphs that are largest CGs with no flags and an alternative characterization of graphs which are largest CGs. Furthermore, a known characterization of the essential graphs is shown to be a special case of our more general framework. The three graphical characterizations have a common structure: they use two versions of a locally verifiable graphical rule. Moreover, in case of DAGs, an immediate comparison of three characterizing graphs is possible

    On the application of Gaussian graphical models to paired data problems

    Full text link
    Gaussian graphical models are nowadays commonly applied to the comparison of groups sharing the same variables, by jointly learning their independence structures. We consider the case where there are exactly two dependent groups and the association structure is represented by a family of coloured Gaussian graphical models suited to deal with paired data problems. To learn the two dependent graphs, together with their across-graph association structure, we implement a fused graphical lasso penalty. We carry out a comprehensive analysis of this approach, with special attention to the role played by some relevant submodel classes. In this way, we provide a broad set of tools for the application of Gaussian graphical models to paired data problems. These include results useful for the specification of penalty values in order to obtain a path of lasso solutions and an ADMM algorithm that solves the fused graphical lasso optimization problem. Finally, we carry out a simulation study to compare our method with the traditional graphical lasso, and present an application of our method to cancer genomics where it is of interest to compare cancer cells with a control sample from histologically normal tissues adjacent to the tumor. All the methods described in this article are implemented in the R package pdglasso available at https://github.com/savranciati/pdglasso

    Log-mean linear parameterization for discrete graphical models of marginal independence and the analysis of dichotomizations

    No full text
    We extend the log-mean linear parameterization for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize bi-directed graph models. Furthermore, we show that the log-mean linear parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. We illustrate the application of this property by means of an example based on genetic association studies involving single-nucleotide polymorphisms. More generally, this feature provides a natural way to reduce the parameter count, while preserving the independence structure, by means of substantive constraints that give additional insight into the association structure of the variables

    The networked partial correlation and its application to the analysis of genetic interactions

    Full text link
    Genetic interactions confer robustness on cells in response to genetic perturbations. This often occurs through molecular buffering mechanisms that can be predicted by using, among other features, the degree of coexpression between genes, which is commonly estimated through marginal measures of association such as Pearson or Spearman correlation coefficients. However, marginal correlations are sensitive to indirect effects and often partial correlations are used instead. Yet, partial correlations convey no information about the (linear) influence of the coexpressed genes on the entire multivariate system, which may be crucial to discriminate functional associations from genetic interactions. To address these two shortcomings, here we propose to use the edge weight derived from the covariance decomposition over the paths of the associated gene network. We call this new quantity the networked partial correlation and use it to analyse genetic interactions in yeast.We acknowledge the support of the Spanish Ministry of Economy and Competitiveness (TIN2015-71079-P), the Catalan Agency for Management of University and Research Grants (SGR14-1121) and the European Cooperation in Science and Technology (CA15109)

    Log-mean linear regression models for binary responses with an application to multimorbidity

    Full text link
    In regression models for categorical data a linear model is typically related to the response variables via a transformation of probabilities called the link function. We introduce an approach based on two link functions for binary data named the log-mean and the log-mean linear methods. The choice of the link function plays a key role in the interpretation of the model, and our approach is especially appealing in terms of interpretation of the effects of covariates on the association of responses. Similarly to Poisson regression, the log-mean and log-mean linear regression coefficients of single outcomes are log-relative-risks, and we show that the relative risk interpretation is maintained also in the regressions of the association of responses. Furthermore, certain collections of zero log-mean linear regression coefficients imply that the relative risks for joint responses factorize with respect to the corresponding relative risks for marginal responses. This work is motivated by the analysis of a data set obtained from a case–control study aimed at investigating the effect of human immunodeficiency virus infection on multimorbidity, i.e. simultaneous presence of two or more non-infectious comorbidities in one patient

    Path weights in concentration graphs

    Full text link
    A graphical model provides a compact and efficient representation of the association structure in a multivariate distribution by means of a graph. Relevant features of the distribution are represented by vertices, edges and higher-order graphical structures such as cliques or paths. Typically, paths play a central role in these models because they determine the dependence relationships between variables. However, while a theory of path coefficients is available for directed graph models, little research exists on the strength of the association represented by a path in an undirected graph. Essentially, it has been shown that the covariance between two variables can be decomposed into a sum of weights associated with each of the paths connecting the two variables in the corresponding concentration graph. In this context, we consider concentration graph models and provide an extensive analysis of the properties of path weights and their interpretation. Specifically, we give an interpretation of covariance weights through their factorization into a partial covariance and an inflation factor. We then extend the covariance decomposition over the paths of an undirected graph to other measures of association, such as the marginal correlation coefficient and a quantity that we call the inflated correlation. Application of these results is illustrated with an analysis of dietary intake networks

    A graphical representation of equivalence classes of AMP chain graphs

    No full text
    This paper deals with chain graph models under alternative AMP interpretation. A new representative of an AMP Markov equivalence class, called the largest deflagged graph, is proposed. The representative is based on revealed internal structure of the AMP Markov equivalence class. More specifically, the AMP Markov equivalence class decomposes into finer strong equivalence classes and there exists a distinguished strong equivalence class among those forming the AMP Markov equivalence class. The largest deflagged graph is the largest chain graph in that distinguished strong equivalence class. A composed graphical procedure to get the largest deflagged graph on the basis of any AMP Markov equivalent chain graph is presented. In general, the largest deflagged graph differs from the AMP essential graph, which is another representative of the AMP Markov equivalence clas
    corecore