1,721,170 research outputs found
Mixture models for mixed-type data through a composite likelihood approach
A mixture model is considered to classify continuous and/or ordinal variables. Under this model, both the continuous and the ordinal variables are assumed to follow a heteroscedastic Gaussian mixture model, where, as regards the ordinal variables, it is only partially observed. More specifically, the ordinal variables are assumed to be a discretization of some mixture variables. From a computational point of view, this creates some problems for the maximum likelihood estimation of model parameters. Indeed, the likelihood function involves multidimensional integrals, whose evaluation is computationally demanding as the number of ordinal variables increases. The proposal is to replace this cumbersome likelihood with a surrogate objective function that is easier to maximize. A composite approach is used, in particular the original joint distribution is replaced by the product of three blocks: the marginal distribution of continuous variables, all bivariate marginal distributions of ordinal variables and the marginal distributions given by all continuous variables and only one ordinal variable. This leads to a surrogate function that is the sum of the log contributions for each block. The estimation of model parameters is carried out maximizing the surrogate function within an EM-like algorithm. The effectiveness of the proposal is investigated through a simulation study and two applications to real data
Mixture models for ordinal data: a pairwise likelihood approach
Alatent Gaussian mixturemodel to classify ordinal
data is proposed. The observed categorical variables are
considered as a discretization of an underlying finite mixture
of Gaussians. The model is estimated within the expectation-maximization
(EM) framework maximizing a pairwise likelihood.
This allows us to overcome the computational problems
arising in the full maximum likelihood approach due
to the evaluation of multidimensional integrals that cannot
be written in closed form. Moreover, a method to cluster
the observations on the basis of the posterior probabilities
in output of the pairwise EM algorithm is suggested. The
effectiveness of the proposal is shown comparing the pairwise
likelihood approach with the full maximum likelihood
and the maximum likelihood for continuous data ignoring
the ordinal nature of the variables. The comparison is made
by means of a simulation study; applications to real data are
provided
Mixture models for ordinal data: a pairwise likelihood approach
A latent Gaussian mixture model to classify ordinal data is proposed. The observed categorical variables are considered as a discretization of an underlying finite mixture of Gaussians. The model is estimated within the expectation-maximization (EM) framework maximizing a pairwise likelihood. This allows us to overcome the computational problems arising in the full maximum likelihood approach due to the evaluation of multidimensional integrals that cannot be written in closed form. Moreover, a method to cluster the observations on the basis of the posterior probabilities in output of the pairwise EM algorithm is suggested. The effectiveness of the proposal is shown comparing the pairwise likelihood approach with the full maximum likelihood and the maximum likelihood for continuous data ignoring the ordinal nature of the variables. The comparison is made by means of a simulation study; applications to real data are provided
Composite likelihood methods for parsimonious model-based clustering of mixed-type data
In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous data follow a multivariate finite mixture of Gaussians, where the ordinal variables are a discretization of some continuous variates of the mixture. The general class of parsimonious models is based on a factor decomposition of the component-specific covariance matrices. Parameter estimation is carried out using a EM-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data
Standard and novel model selection criteria in the pairwise likelihood estimation of a mixture model for ordinal data
In this paper, we provide an overview on the underlying response variable (URV) model-based approach to cluster and, optionally, simultaneously reduce ordinal and, optionally, continuous variables. We summarize and compare its main features discussing some key issues. An example of application to real data is illustrated comparing and discussing clustering performances
Dimension reduction for longitudinal multivariate data by optimizing class separation of projected latent Markov models
We present a method for dimension reduction of multivariate longitudinal data, where new variables are assumed to follow a latent Markov model. New variables are obtained as linear combinations of the multivariate outcome as usual. Weights of each linear combination maximize a measure of separation of the latent intercepts, subject to orthogonality constraints. We evaluate our proposal in a simulation study and illustrate it using an EU-level data set on income and living conditions, where dimension reduction leads to an optimal scoring system for material deprivation. An R implementation of our approach can be downloaded from https://github.com/afarcome/LMdim
A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data
The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal
Nonparametric Estimation of Utilized Surface Area for Cereals Production at Agrarian Region Level in Tuscany
- …
