1,721,031 research outputs found

    Classification of Microarray Data with Factor Mixture Models.

    No full text
    The classification of few tissue samples on a very large number of genes represents a non-standard problem in statistics but a usual one in microarray expression data analysis. In fact, the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. We consider high-density oligonucleotide microarray data, where the expression level is associated to an ‘abso- lute call’, which represents a qualitative indication of whether or not a transcript is detected within a sample. The ‘absolute call’ is generally not taken in consideration in analyses. Results: In contrast to frequently used cluster analysis methods to analyze gene expression data, we consider a problem of classification of tissues and of the variables selection. We adopted methodologies formulated by Ghahramani and Hinton and Rocci and Vichi for simul- taneous dimensional reduction of genes and classification of tissues; trying to identify genes (denominated ‘markers’) that are able to distin- guish between two known different classes of tissue samples. In this respect, we propose a generalization of the approach proposed by McLachlan et al. by advising to estimate the distribution of log LR statis- tic for testing one versus two component hypothesis in the mixture model for each gene considered individually, using a parametric bootstrap approach. We compare conditional (on ‘absolute call’) and unconditional analyses performed on dataset described in Golub et al. We show that the proposed techniques improve the results of classi- fication of tissue samples with respect to known results on the same benchmark dataset. Availability: The software of Ghahramani and Hinton is written in Matlab and available in ‘Mixture of Factor Analyzers’ on http://www. gatsby.ucl.ac.uk/zoubin/software.html while the software of Rocci and Vichi is available upon request from the author

    Clustering of microarray data using finite mixture models

    No full text
    The study of the molecular variation among diseases is rapidly growing thanks to the development of microarray-based technologies. In fact, such technologies allow us to simultaneously measure thousands of gene expression levels from biological tissue samples. A major task in this context is the classification of samples to improve the diagnoses of patients and, therefore, the quality of treatments. We discuss a model-based approach which allows both to reduce the dimension of genes and to cluster the tissue samples, simultaneously. We adopt statistical techniques formulated by Ghahramani and Hinton (1996) and Rocci and Vichi (2002). The performance of the proposed models is illustrated on a well known data set in microarray literature: the leukaemia data, containing classes that are well known to be easy separable (Golub et al., 1999)

    Clustering microarray data using model-based double K-means

    No full text
    The microarray technology allows the measurement of expression levels of thousands of genes simultaneously. The dimension and complexity of gene expression data obtained by microarrays create challenging data analysis and management problems ranging from the analysis of images produced by microarray experiments to biological interpretation of results. Therefore, statistical and computational approaches are beginning to assume a substantial position within the molecular biology area. We consider the problem of simultaneously clustering genes and tissue samples (in general conditions) of a microarray data set. This can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. The need of finding a subset of genes and tissue samples defining a homogeneous block had led to the application of double clustering techniques on gene expression data. Here, we focus on an extension of standard K-means to simultaneously cluster observations and features of a data matrix, namely double K-means introduced by Vichi (2000). We introduce this model in a probabilistic framework and discuss the advantages of using this approach. We also develop a coordinate ascent algorithm and test its performance via simulation studies and real data set. Finally, we validate the results obtained on the real data set by building resampling confidence intervals for block centroids. © 2012 Copyright Taylor and Francis Group, LLC

    Hidden markov of factor analyzers for biclustering of microarray time course data in multiple conditions.

    No full text
    A challenging task in time course microarray data is to discover groups of genes that show homogeneous temporal expression patterns when time course experiments are collected in multiple biological conditions. In such case, an appealing goal would be related to discover local structures composed by sets of genes that show homogeneous expression patterns across subsets of biological conditions which also capture the history of the gene's and condition's dynamic behavior across time. To address this, at each time point one could apply any of biclustering methods for identifying differentially expressed genes across biological conditions. However, a consideration of each time point in isolation can be inefficient, because it does not use the information contained in the dependence structure of the time course data. Our proposal is an extension of the Hidden Markov of factor analyzers model allowing for simultaneous clustering of genes and biological conditions. The proposed model is rath

    Identifying partitions of genes and tissue samples in microarray data.

    No full text
    An important challenge in microarray data analyses is the detection of genes which are differentially expressed across different types of experimental conditions. We provide an extension of a finite mixture model to the clustering of genes and experimental conditions, where the partition of experimental conditions may be known or unknown. In particular, the idea is to adopt a finite mixture approach with mean/covariance reparameterization, where an explicit distinction among upregulated genes, down-regulated genes, non-regulated genes (with respect to a reference) is made; moreover, within each of these groups; genes that are differentially expressed between two or more types of experimental conditions may be identified

    Model-based double clustering

    No full text
    In this paper a new methodology for simultaneously clustering of objects and variables of a two-way data matrix is proposed in a model-based framework and with a maximum likelihood approach. The methodology is described by both a real application and simulation studies

    A biclustering approach for discrete outcomes

    No full text
    We discuss an extension of mixtures of factor analyzers (MFA) to allow for simultaneous clustering of subjects and variables where discrete manifest variables are available. To estimate model parameters, we propose a modified EM algorithm in a ML framework

    Model-based approaches to synthesize microarray data: a unifying review using mixture SEM

    No full text
    A considerable amount of di®erent of approaches have been proposed for synthesizing gene expression data obtained from microarray experiments. In this paper, we have a closer look at various types of Gaussian mixture models which have recently been proposed in the gene expression level literature. It is shown that these are, in fact, special cases of a more general model; that is, the mixture structural equation model developed in psychometrics (Arminger and Stein, 1997; Dolan and van der Maas, 1998). This model combines mixture modeling and SEM by assuming that within each mixture component the model parameters are subject to a structural equation model (SEM). A SEM is a very general model for a multivariate Gaussian mean vector and covariance matrix. The various Gaussian mixture methodologies for microarray analysis { such as mixture factor analyzers { are special case of mixture SEM which can be obtained by imposing speci ̄c restrictions on the SEM model parameters; item intercepts
    corecore