1,721,170 research outputs found

    Mixture models for mixed-type data through a composite likelihood approach

    No full text
    A mixture model is considered to classify continuous and/or ordinal variables. Under this model, both the continuous and the ordinal variables are assumed to follow a heteroscedastic Gaussian mixture model, where, as regards the ordinal variables, it is only partially observed. More specifically, the ordinal variables are assumed to be a discretization of some mixture variables. From a computational point of view, this creates some problems for the maximum likelihood estimation of model parameters. Indeed, the likelihood function involves multidimensional integrals, whose evaluation is computationally demanding as the number of ordinal variables increases. The proposal is to replace this cumbersome likelihood with a surrogate objective function that is easier to maximize. A composite approach is used, in particular the original joint distribution is replaced by the product of three blocks: the marginal distribution of continuous variables, all bivariate marginal distributions of ordinal variables and the marginal distributions given by all continuous variables and only one ordinal variable. This leads to a surrogate function that is the sum of the log contributions for each block. The estimation of model parameters is carried out maximizing the surrogate function within an EM-like algorithm. The effectiveness of the proposal is investigated through a simulation study and two applications to real data

    Mixture models for ordinal data: a pairwise likelihood approach

    No full text
    Alatent Gaussian mixturemodel to classify ordinal data is proposed. The observed categorical variables are considered as a discretization of an underlying finite mixture of Gaussians. The model is estimated within the expectation-maximization (EM) framework maximizing a pairwise likelihood. This allows us to overcome the computational problems arising in the full maximum likelihood approach due to the evaluation of multidimensional integrals that cannot be written in closed form. Moreover, a method to cluster the observations on the basis of the posterior probabilities in output of the pairwise EM algorithm is suggested. The effectiveness of the proposal is shown comparing the pairwise likelihood approach with the full maximum likelihood and the maximum likelihood for continuous data ignoring the ordinal nature of the variables. The comparison is made by means of a simulation study; applications to real data are provided

    Mixture models for ordinal data: a pairwise likelihood approach

    No full text
    A latent Gaussian mixture model to classify ordinal data is proposed. The observed categorical variables are considered as a discretization of an underlying finite mixture of Gaussians. The model is estimated within the expectation-maximization (EM) framework maximizing a pairwise likelihood. This allows us to overcome the computational problems arising in the full maximum likelihood approach due to the evaluation of multidimensional integrals that cannot be written in closed form. Moreover, a method to cluster the observations on the basis of the posterior probabilities in output of the pairwise EM algorithm is suggested. The effectiveness of the proposal is shown comparing the pairwise likelihood approach with the full maximum likelihood and the maximum likelihood for continuous data ignoring the ordinal nature of the variables. The comparison is made by means of a simulation study; applications to real data are provided

    Composite likelihood methods for parsimonious model-based clustering of mixed-type data

    No full text
    In this paper, we propose twelve parsimonious models for clustering mixed-type (ordinal and continuous) data. The dependence among the different types of variables is modeled by assuming that ordinal and continuous data follow a multivariate finite mixture of Gaussians, where the ordinal variables are a discretization of some continuous variates of the mixture. The general class of parsimonious models is based on a factor decomposition of the component-specific covariance matrices. Parameter estimation is carried out using a EM-type algorithm based on composite likelihood. The proposal is evaluated through a simulation study and an application to real data

    Standard and novel model selection criteria in the pairwise likelihood estimation of a mixture model for ordinal data

    No full text
    In this paper, we provide an overview on the underlying response variable (URV) model-based approach to cluster and, optionally, simultaneously reduce ordinal and, optionally, continuous variables. We summarize and compare its main features discussing some key issues. An example of application to real data is illustrated comparing and discussing clustering performances

    Dimension reduction for longitudinal multivariate data by optimizing class separation of projected latent Markov models

    Full text link
    We present a method for dimension reduction of multivariate longitudinal data, where new variables are assumed to follow a latent Markov model. New variables are obtained as linear combinations of the multivariate outcome as usual. Weights of each linear combination maximize a measure of separation of the latent intercepts, subject to orthogonality constraints. We evaluate our proposal in a simulation study and illustrate it using an EU-level data set on income and living conditions, where dimension reduction leads to an optimal scoring system for material deprivation. An R implementation of our approach can be downloaded from https://github.com/afarcome/LMdim

    A Model-Based Approach to Simultaneous Clustering and Dimensional Reduction of Ordinal Data

    No full text
    The literature on clustering for continuous data is rich and wide; differently, that one developed for categorical data is still limited. In some cases, the clustering problem is made more difficult by the presence of noise variables/dimensions that do not contain information about the clustering structure and could mask it. The aim of this paper is to propose a model for simultaneous clustering and dimensionality reduction of ordered categorical data able to detect the discriminative dimensions discarding the noise ones. Following the underlying response variable approach, the observed variables are considered as a discretization of underlying first-order latent continuous variables distributed as a Gaussian mixture. To recognize discriminative and noise dimensions, these variables are considered to be linear combinations of two independent sets of second-order latent variables where only one contains the information about the cluster structure while the other one contains noise dimensions. The model specification involves multidimensional integrals that make the maximum likelihood estimation cumbersome and in some cases infeasible. To overcome this issue, the parameter estimation is carried out through an EM-like algorithm maximizing a composite log-likelihood based on low-dimensional margins. Examples of application of the proposal on real and simulated data are performed to show the effectiveness of the proposal
    corecore