1,721,212 research outputs found
Modelling Lorenz Curves: Robust and Semi-parametric Issues
Modelling Lorenz curves (LC) for stochastic dominance comparisons is central to the analysis of income distributions. It is conventional to use non-parametric statistics based on empirical income cumulants which are used in the construction of LC and other related second-order dominance criteria. However, although attractive because of its simplicity and its apparent flexibility, this approach suffers from important drawbacks. While no assumptions need to be made regarding the datagenerating process (income distribution model), the empirical LC can be very sensitive to data particularities, especially in the upper tail of the distribution. This robustness problem can lead in practice to “wrong” interpretation of dominance orders. A possible remedy for this problem is the use of parametric or semi-parametric models for the data-generating process and robust estimators to obtain parameter estimates. In this paper, we focus on the robust estimation of semi-parametric LC and investigate issues such as sensitivity of LC estimators to data contamination (Cowell and Victoria-Feser, 2002), trimmed LC (Cowell and Victoria-Feser, 2006), and inference for trimmed LC (Cowell and Victoria-Feser, 2003), robust semi-parametric estimation for LC (Cowell and Victoria-Feser, 2007), selection of optimal thresholds for (robust) semi-parametric modelling (Dupuis and Victoria-Feser, 2006), and use both simulations and real data to illustrate these points
De-Biasing Weighted MLE via Indirect Inference: The Case of Generalized Linear Latent Variable Models
In this paper we study bias-corrections to the weighted MLE (Dupuis and Morgenthaler, 2002), a robust estimator simply defined through a weighted score function. Indeed, although the WMLE is relatively simple to compute, for most models it is not consistent and hence not very helpful. For example, the model we consider in this paper is the generalized linear latent variable model (GLLVM) proposed in Moustaki and Knott (2000) (see also Moustaki, 1996, Sammel, Ryan, and Legler, 1997 and Bartholomew and Knott, 1999). The score functions of this model are very complicated. They contain integrals that need to be evaluated. Moreover, they are highly nonlinear in the parameters which makes the use of complicated robust estimator quite impossible in practice. Moustaki and Victoria-Feser (2006) propose to use a weighted MLE and develop indirect inference (Gouri´eroux, Monfort, and Renault, 1993, Gallant and Tauchen, 1996 and also Genton and de Luna, 2000, Genton and Ronchetti, 2003) to remove the bias. It can be computed in a simple iterative fashion. In this paper, we actually focus on indirect inference for bias correction in general. We rely heavily on the findings of Moustaki and Victoria-Feser (2006)
Fast Algorithms for Computing High Breakdown Covariance Matrices with Missing Data
Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid algorithms (see, e.g., Woodruff and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data. To compute it, they use (a modified version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modification of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adapted S-estimator. Moreover, we conduct a simulation study to compare different robust estimators in terms of their efficiency and breakdown
Modelling Lorenz curves: robust and semi-parametric issues
Modelling Lorenz curves (LC) for stochastic dominance comparisons is central to the analysis of income distribution. It is conventional to use non-parametric statistics based on empirical income cumulants which are in the construction of LC and other related second-order dominance criteria. However, although attractive because of its simplicity and its apparent flexibility, this approach suffers from important drawbacks. While no assumptions need to be made regarding the data-generating process (income distribution model), the empirical LC can be very sensitive to data particularities, especially in the upper tail of the distribution. This robustness problem can lead in practice to “wrong” interpretation of dominance orders. A possible remedy for this problem is the use of parametric or semi-parametric models for the data-generating process and robust estimators to obtain parameter estimates. In this paper, we focus on the robust estimation of semi-parametric LC and investigate issues such as sensitivity of LC estimators to data contamination (Cowell and Victoria-Feser 2002), trimmed LC (Cowell and Victoria-Feser 2006) and inference for trimmed LC (Cowell and Victoria-Feser 2003), robust semi-parametric estimation for LC (Cowell and Victoria-Feser 2007) selection of optimal thresholds for (robust) semi-parametric modelling (Dupuis and Victoria-Feser 2006) and use both simulations and real data to illustrate these points
A simulation study to compare competing estimators in structural equation models with ordinal variables
Structural equation models have been around for now a long time. They are intensively used to analyze data from di.erent fields such as psychology, social sciences, economics, management, etc. Their estimation can be performed using standard statistical packages such as LISREL. However, these implementations su.er from an important drawback: they are not suited for cases in which the variables are far from the normal distribution. This happens in particular with ordinal data that have a non symmetric distribution, a situation often encountered in practice. An alternative approach would be to use generalized linear latent variable models (GLLVM) as defined for example in Bartholomew and Knott 1999 and Moustaki and Knott (2000). These models consider the data as they are, i.e. binary or ordinal but the loglikelihood function is intractable and needs numerical approximations to compute it. Several approaches exist such as Gauss-Hermite quadratures or simulation based methods, as well as the Laplace approximation, i.e. the Laplace approximated maximum likelihood estimator (LAMLE) proposed by Huber, Ronchetti, and Victoria-Feser (2004) for these models. The advantage of the later is that it is very fast and hence can cope with relatively complicated models. In this paper, we perform a simulation study to compare the parameters' estimators provided by LISREL which is taken as a benchmark, and the LAMLE when the data are generated from a confirmatory factor analysis model with normal variables which are then transformed into ordinal ones. We will show that while the LISREL estimators can provide seriously biased estimators, the LAMLE not only is unbiased, but one can also recover an unbiased estimator of the correlation matrix of the original normal variables
Modelling Lorenz Curves:robust and semi-parametric issues
Modelling Lorenz curves (LC) for stochastic dominance comparisons is central to the analysis of income distribution. It is conventional to use non-parametric statistics based on empirical income cumulants which are in the construction of LC and other related second-order dominance criteria. However, although attractive because of its simplicity and its apparent flexibility, this approach suffers from important drawbacks. While no assumptions need to be made regarding the data-generating process (income distribution model), the empirical LC can be very sensitive to data particularities, especially in the upper tail of the distribution. This robustness problem can lead in practice to 'wrong' interpretation of dominance orders. A possible remedy for this problem is the use of parametric or semi-parametric models for the datagenerating process and robust estimators to obtain parameter estimates. In this paper, we focus on the robust estimation of semi parametric LC and investigate issues such as sensitivity of LC estimators to data contamination (Cowell and Victoria-Feser 2002), trimmed LC (Cowell and Victoria-Feser 2006) and inference for trimmed LC (Cowell and Victoria-Feser 2003), robust semi-parametric estimation for LC (Cowell and Victoria-Feser 2007) selection of optimal thresholds for (robust) semi parametric modelling (Dupuis and Victoria-Feser 2006) and use both simulations and real data to illustrate these points.
A General Robust Approach to the Analysis of Income Distribution, Inequality and Poverty
Income distribution embeds a large field of research subjects in economics. It is important to study how incomes are distributed among the members of a population in order for example to determine tax policies for redistribution to decrease inequality, or to implement social policies to reduce poverty. The available data come mostly from surveys (and not censuses as it is often believed) and are often subject to long debates about their reliability because the sources of errors are numerous. Moreover the forms in which the data are available is not always as one would expect, i.e. complete and continuous (micro data) but one also can only have data in a grouped form (in income classes) and/or truncated data where a portion of the original data has been omitted from the sample or simply not recorded. Because of these data features, it is important to complement classical statistical procedures with robust ones. In this paper such methods are presented, especially for model selection, model fitting with several types of data, inequality and poverty analysis and ordering tools. The approach is based on the Influence Function (IF ) developed by Hampel (1974) and further developed by Hampel, Ronchetti, Rousseeuw, and Stahel (1986). It is also shown through the analysis of real UK and Tunisian data, that robust techniques can give another picture of income distribution, inequality or poverty when compared to classical ones
Robust Inference with Binary Data
In this paper robustness properties of the maximum likelihood estimator (MLE) and several robust estimators for the logistic regression model when the responses are binary are analysed. It is found that the MLE and the classical Rao's score test can be misleading in the presence of model misspecification which in the context of logistic regression means either misclassification's errors in the responses, or extreme data points in the design space. A general framework for robust estimation and testing is presented and a robust estimator as well as a robust testing procedure are presented. It is shown that they are less influenced by model misspecifications than their classical counterparts. They are finally applied to the analysis of binary data from a study on breastfeeding
A Robust Test for Non-nested Hypotheses
We propose a robust version of Cox-type test statistics for the choice between two nonnested hypotheses. We first show that the influence of small amounts of contamination in the data on the test decision can be very large. Secondly, we build a robust test statistic by using the results on robust parametric tests that are available in the literature and show that the level of the robust test is stable. Finally, we show numerically not only the robustness of this new test statistic but also that its asymptotic distribution is a good approximation of its sample distribution, unlike for the classical test statistic. We apply our results to the choice between a Pareto and an exponential distribution as well as between two competing regressors in the simple linear regression model without intercept
Robust Income Distribution Estimation with Missing Data
With income distributions it is common to encounter the problem of missing
data. When a parametric model is fitted to the data, the problem can be
overcome by specifying the marginal distribution of the observed data. With
classical methods of estimation such as the maximum likelihood (ML) an
estimator of the parameters can be obtained in a straightforward manner.
Unfortunately, it is well known that ML estimators are not robust estimators in
the presence of contaminated data. In this paper, we propose a robust
alternative to the ML estimator with truncated data, namely one based on M-
estimators that we call the EMM estimator. We present an extensive simulation
study where the EMM estimator based on optimal B-robust estimators (OBRE)
is compared to a more conservative approach based on marginal density (MD)
for truncated data, and show that the difference lies in the way the weights
associated to each observation are computed. Finally, we also compare the
EMM estimator based on the OBRE with the classical ML estimator when the
data are contaminated, and show that contrary to the former, the latter can be
seriously biased
- …
