1,721,031 research outputs found
Classification of Microarray Data with Factor Mixture Models.
The classification of few tissue samples on a very large
number of genes represents a non-standard problem in statistics but a
usual one in microarray expression data analysis. In fact, the dimension
of the feature space (the number of genes) is typically much greater
than the number of tissues. We consider high-density oligonucleotide
microarray data, where the expression level is associated to an ‘abso-
lute call’, which represents a qualitative indication of whether or not a
transcript is detected within a sample. The ‘absolute call’ is generally
not taken in consideration in analyses.
Results: In contrast to frequently used cluster analysis methods to
analyze gene expression data, we consider a problem of classification
of tissues and of the variables selection. We adopted methodologies
formulated by Ghahramani and Hinton and Rocci and Vichi for simul-
taneous dimensional reduction of genes and classification of tissues;
trying to identify genes (denominated ‘markers’) that are able to distin-
guish between two known different classes of tissue samples. In this
respect, we propose a generalization of the approach proposed by
McLachlan et al. by advising to estimate the distribution of log LR statis-
tic for testing one versus two component hypothesis in the mixture
model for each gene considered individually, using a parametric
bootstrap approach. We compare conditional (on ‘absolute call’) and
unconditional analyses performed on dataset described in Golub et al.
We show that the proposed techniques improve the results of classi-
fication of tissue samples with respect to known results on the same
benchmark dataset.
Availability: The software of Ghahramani and Hinton is written in
Matlab and available in ‘Mixture of Factor Analyzers’ on http://www.
gatsby.ucl.ac.uk/zoubin/software.html while the software of Rocci
and Vichi is available upon request from the author
Clustering of microarray data using finite mixture models
The study of the molecular variation among diseases is rapidly growing thanks
to the development of microarray-based technologies. In fact, such technologies allow
us to simultaneously measure thousands of gene expression levels from biological tissue
samples.
A major task in this context is the classification of samples to improve the diagnoses
of patients and, therefore, the quality of treatments.
We discuss a model-based approach which allows both to reduce the dimension of
genes and to cluster the tissue samples, simultaneously. We adopt statistical techniques
formulated by Ghahramani and Hinton (1996) and Rocci and Vichi (2002).
The performance of the proposed models is illustrated on a well known data set in
microarray literature: the leukaemia data, containing classes that are well known to be
easy separable (Golub et al., 1999)
Clustering microarray data using model-based double K-means
The microarray technology allows the measurement of expression levels of thousands of genes simultaneously. The dimension and complexity of gene expression data obtained by microarrays create challenging data analysis and management problems ranging from the analysis of images produced by microarray experiments to biological interpretation of results. Therefore, statistical and computational approaches are beginning to assume a substantial position within the molecular biology area. We consider the problem of simultaneously clustering genes and tissue samples (in general conditions) of a microarray data set. This can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. The need of finding a subset of genes and tissue samples defining a homogeneous block had led to the application of double clustering techniques on gene expression data. Here, we focus on an extension of standard K-means to simultaneously cluster observations and features of a data matrix, namely double K-means introduced by Vichi (2000). We introduce this model in a probabilistic framework and discuss the advantages of using this approach. We also develop a coordinate ascent algorithm and test its performance via simulation studies and real data set. Finally, we validate the results obtained on the real data set by building resampling confidence intervals for block centroids. © 2012 Copyright Taylor and Francis Group, LLC
Hidden markov of factor analyzers for biclustering of microarray time course data in multiple conditions.
A challenging task in time course microarray data is to discover groups of genes that show homogeneous
temporal expression patterns when time course experiments are collected in multiple biological
conditions. In such case, an appealing goal would be related to discover local structures composed by sets
of genes that show homogeneous expression patterns across subsets of biological conditions which also
capture the history of the gene's and condition's dynamic behavior across time. To address this, at each
time point one could apply any of biclustering methods for identifying differentially expressed genes
across biological conditions. However, a consideration of each time point in isolation can be inefficient,
because it does not use the information contained in the dependence structure of the time course data. Our
proposal is an extension of the Hidden Markov of factor analyzers model allowing for simultaneous
clustering of genes and biological conditions. The proposed model is rath
Identifying partitions of genes and tissue samples in microarray data.
An important challenge in microarray data analyses is the detection of
genes which are differentially expressed across different types of experimental conditions.
We provide an extension of a finite mixture model to the clustering of genes
and experimental conditions, where the partition of experimental conditions may
be known or unknown. In particular, the idea is to adopt a finite mixture approach
with mean/covariance reparameterization, where an explicit distinction among upregulated
genes, down-regulated genes, non-regulated genes (with respect to a reference)
is made; moreover, within each of these groups; genes that are differentially
expressed between two or more types of experimental conditions may be identified
Model-based double clustering
In this paper a new methodology for simultaneously clustering of objects and
variables of a two-way data matrix is proposed in a model-based framework
and with a maximum likelihood approach. The methodology is described
by both a real application and simulation studies
A biclustering approach for discrete outcomes
We discuss an extension of mixtures of factor analyzers (MFA) to allow
for simultaneous clustering of subjects and variables where discrete manifest variables are available. To estimate model parameters, we propose a modified EM algorithm in a ML framework
Model-based approaches to synthesize microarray data: a unifying review using mixture SEM
A considerable amount of di®erent of approaches have been proposed
for synthesizing gene expression data obtained from microarray experiments. In this
paper, we have a closer look at various types of Gaussian mixture models which
have recently been proposed in the gene expression level literature. It is shown
that these are, in fact, special cases of a more general model; that is, the mixture
structural equation model developed in psychometrics (Arminger and Stein, 1997;
Dolan and van der Maas, 1998). This model combines mixture modeling and SEM by
assuming that within each mixture component the model parameters are subject to a
structural equation model (SEM). A SEM is a very general model for a multivariate
Gaussian mean vector and covariance matrix.
The various Gaussian mixture methodologies for microarray analysis { such as
mixture factor analyzers { are special case of mixture SEM which can be obtained by
imposing speci ̄c restrictions on the SEM model parameters; item intercepts
- …
