1,721,054 research outputs found
Robust methods for heteroskedastic regression
Heteroskedastic regression data are modelled using a parameterized variance function. This procedure is robustified using a method with high breakdown point and high efficiency, which provides a direct link between observations and the weights used in model fitting. This feature is vital for the application, the analysis of international trade data from the European Union. Heteroskedasticity is strongly present in such data, as are outliers. A further example shows that the new method outperforms ordinary least squares with heteroskedasticity robust standard errors, even when the form of heteroskedasticity is mis-specified. A discussion of computational matters concludes the paper. An appendix presents the new scoring algorithm for estimation of the parameters of heteroskedasticity
Handbook of the design and analysis of experiments: designs for generalized linear models
The analysis of transformations for profit‐and‐loss data
We analyse data on the performance of investment funds, 99 out of 309 of which report a loss, and on the profitability of 1405 firms, 407 of which report losses. The problem in both cases is to use regression to predict performance from sets of explanatory variables. In one case, it is clear from scatter plots of the data that the negative responses have a lower variance than the positive responses and a different relationship with the explanatory variables. Because the data include negative responses, the Box–Cox transformation cannot be used. We develop a robust version of an extension to the Yeo–Johnson transformation which allows different transformations for positive and negative responses.Tests and graphical methods from our robust analysis enable the detection of outliers, the assessment of values of the two transformation parameters and the building of simple regression models. Performance comparisons are made with non-parametric transformations
The Use of Prior Information in Very Robust Regression for Fraud Detection
Misinvoicing is a major tool in fraud including money laundering. We develop a method of detecting the patterns of outliers that indicate systematic mis-pricing. As the data only become available year by year, we develop a combination of very robust regression and the use of 'cleaned' prior information from earlier years, which leads to early and sharp indication of potentially fraudulent activity that can be passed to legal agencies to institute prosecution. As an example, we use yearly imports of a specific seafood into the European Union. This is only one of over one million annual data sets, each of which can currently potentially contain 336 observations. We provide a solution to the resulting big data problem, which requires analysis with the minimum of human intervention
Robust Transformations for Multiple Regression via Additivity and Variance Stabilization
Outliers can have a major effect on the estimated transformation of the response in linear regression models, as they can on the estimates of the coefficients of the fitted model. The effect is more extreme in the Generalized Additive Models (GAMs) that are the subject of this article, as the forms of terms in the model can also be affected. We develop, describe and illustrate robust methods for the nonparametric transformation of the response and estimation of the terms in the model. Numerical integration is used to calculate the estimated variance stabilizing transformation. Robust regression provides outlier free input to the polynomial smoothers used in the calculation of the response transformation and in the backfitting algorithm for estimation of the functions of the GAM. Our starting point was the AVAS (Additivity and VAriance Stabilization) algorithm of Tibshirani. Even if robustness is not required, we have made four further general optional improvements to AVAS which greatly improve the performance of Tibshirani’s original Fortran program. We provide a publicly available and fully documented interactive program for our procedure which is a robust form of Tibshirani’s AVAS that allows many forms of robust regression. We illustrate the efficacy of our procedure through data analyses. A refinement of the backfitting algorithm has interesting implications for robust model selection. Supplementary materials for this article are available online
Statistical and Proactive Analysis of an Inter-Laboratory Comparison: The Radiocarbon Dating of the Shroud of Turin.
We review the sampling and results of the radiocarbon dating of the archaeological cloth known as the Shroud of Turin, in the light of recent statistical analyses of both published and raw data. The statistical analyses highlight an inter-laboratory heterogeneity of the means and a monotone spatial variation of the ages of subsamples that suggest the presence of contaminants unevenly removed by the cleaning pretreatments. We consider the significance and overall impact of the statistical analyses on assessing the reliability of the dating results and the design of correct sampling. These analyses suggest that the 1988 radiocarbon dating does not match the current accuracy requirements. Should this be the case, it would be interesting to know the accurate age of the Shroud of Turin. Taking into account the whole body of scientific data, we discuss whether it makes sense to date the Shroud again
The Use of Modern Robust Regression Analysis with Graphics: An Example from Marketing
Routine least squares regression analyses may sometimes miss important aspects of data. To exemplify this point we analyse a set of 1171 observations from a questionnaire intended to illuminate the relationship between customer loyalty and perceptions of such factors as price and community outreach. Our analysis makes much use of graphics and data monitoring to provide a paradigmatic example of the use of modern robust statistical tools based on graphical interaction with data. We start with regression. We perform such an analysis and find significant regression on all factors. However, a variety of plots show that there are some unexplained features, which are not eliminated by response transformation. Accordingly, we turn to robust analyses, intended to give answers unaffected by the presence of data contamination. A robust analysis using a non-parametric model leads to the increased significance of transformations of the explanatory variables. These transformations provide improved insight into consumer behaviour. We provide suggestions for a structured approach to modern robust regression and give links to the software used for our data analyses
The power of monitoring: How to make the most of a contaminated multivariate sample
Diagnostic tools must rely on robust high-breakdown methodologies to avoid distortion in the presence of contamination by outliers. However, a disadvantage of having a single, even if robust, summary of the data is that important choices concerning parameters of the robust method, such as breakdown point, have to be made prior to the analysis. The effect of such choices may be difficult to evaluate. We argue that an effective solution is to look at several pictures, and possibly to a whole movie, of the available data. This can be achieved by monitoring, over a range of parameter values, the results computed through the robust methodology of choice. We show the information gain that monitoring provides in the study of complex data structures through the analysis of multivariate datasets using different high-breakdown techniques. Our findings support the claim that the principle of monitoring is very flexible and that it can lead to robust estimators that are as efficient as possible. We also address through simulation some of the tricky inferential issues that arise from monitoring
The Box-Cox Transformation: Review and Extensions
The Box-Cox power transformation family for non-negative responses
in linear models has a long and interesting history in both statistical practice
and theory, which we describe. The relationship between generalized linear
models and log transformed data is illustrated. Extensions investigated include
the transform both sides model and the Yeo-Johnson transformation
for observations that can be positive or negative. The paper also describes
an extended Yeo-Johnson transformation that allows positive and negative
responses to have different power transformations. Analyses of data show
this to be necessary. Robustness enters in the fan plot for which the forward
search provides an ordering of the data. Plausible transformations are
checked with an extended fan plot. These procedures are used to compare
parametric power transformations with nonparametric transformations produced
by smoothing
- …
