1,720,971 research outputs found
Penalized Estimation of a Finite Mixture of Linear Regression Models
Finite mixtures of linear regressions are often used in practice in order to classify a set of observations and/or explain an unobserved heterogeneity. Their application poses two major challenges. The first is about the maximum likelihood estimation, which is, in theory, impossible in case of Gaussian errors with component specific variances because the likelihood is unbounded. The second is about covariate selection. As in every regression model, there are several candidate predictors and we have to choose the best subset among them. These two problems can share a similar solution, and here lies the motivation of the present paper. The pathway is to add an appropriate penalty to the likelihood. We review possible approaches, discussing and comparing their main features
Non parametric tests and confidence regions for intrinsic diversity profiles of biological populations
LASSO–penalized clusterwise linear regression modelling: a two–step approach
In clusterwise regression analysis, the goal is to predict a response variable based on a set of explanatory variables, each with cluster-specific effects. In many real-life problems, the number of candidate predictors is typically large, with perhaps only a few of them meaningfully contributing to the prediction. A well-known method to perform variable selection is the LASSO, with calibration done by minimizing the Bayesian Information Criterion (BIC). However, existing LASSO-penalized estimators are problematic for several reasons. First, only certain types of penalties are considered. Second, the computations may sometimes involve approximate schemes. Third, variable selection is usually time consuming, due to a complex calibration of the penalty term, possibly requiring several multiple evaluations of an estimator for each plausible value of the tuning parameter(s). We introduce a two-step approach to fill these gaps. In step 1, we fit LASSO clusterwise linear regressions with some pre-specified level of penalization (Fit step). In step 2 (Selection step), we perform covariate selection locally, i.e. on the weighted data, with weights corresponding to the posterior probabilities from the previous step. This is done by using a generalization of the Least Angle Regression (LARS) algorithm, which permits covariate selection with a single evaluation of the estimator. In addition, both Fit and Selection steps leverage on an Expectation Maximization (EM) algorithm, fully in closed forms, designed with a very general version of the LASSO penalty. The advantages of our proposal, in terms of computation time reduction, and accuracy of model estimation and selection, are shown by means of a simulation study, and illustrated with a real data application
A functional approach to diversity profiles
Diversity plays a central role in ecological theory and its conservation and management are important issues for the wellbeing and stability of ecosystems. The aim of this work is to provide a reliable theoretical framework for performing statistical analysis on ecological diversity by means of the joint use of diversity profiles and functional data analysis. We point out that ecological diversity is a multivariate concept as it is a function of the relative abundances of species in a biological community. For this, several researchers have suggested using parametric families of indices of diversity for obtaining more information from the data. Patil and Taillie introduced the concept of intrinsic diversity ordering which can be determined by using the diversity profile. It may be noted that the diversity profile is a non-negative and convex curve which consists of a sequence of measurements as a function of a given parameter. Thus, diversity profiles can be explained through a process that is described in a functional setting. Recent developments in environmental studies have focused on the opportunity to evaluate community diversity changes over space and/or correlation of diversity with environmental characteristics. For this, we develop an innovative analysis of diversity based on a functional data approach. Whereas conventional statistical methods process data as a sequence of individual observations, functional data analysis is designed to process a collection of functions or curves. Moreover, unconstrained models may lead to negative and/or non-convex estimates for the diversity profiles. To overcome this problem, a transformation is proposed which can be constrained to be non-negative and convex. We focus on some applications showing how functional data analysis provides an alternative way of understanding biological diversity and its interaction with natural and/or human factors. Copyright (c) 2009 Royal Statistical Society.
Clusterwise linear regression modeling with soft scale constraints
Constrained approaches to maximum likelihood estimation in the context of finite mixtures of normals have been presented in the literature. A fully data-dependent soft constrained method for maximum likelihood estimation of clusterwise linear regression is proposed, which extends previous work in equivariant data-driven estimation of finite mixtures of normals. The method imposes soft scale bounds based on the homoscedastic variance and a cross-validated tuning parameter c. In our simulation studies and real data examples we show that the selected cwill produce an output model with clusterwise linear regressions and clustering as a most-suited-to-the-data solution in between the homoscedastic and the heteroscedastic models
Adaptive cluster sampling with a data driven stopping rule
Adaptive cluster sampling, Monte carlo simulation, Stopping rule, Efficiency,
A New Dimension Reduction Method: Factor Discriminant K-means
Cluster analysis, Dimension reduction, K-Means, Principal Component Analysis,
- …
