1,721,062 research outputs found
cem: Software for Coarsened Exact Matching
This program is designed to improve causal inference via a method of matching that is
widely applicable in observational data and easy to understand and use (if you understand
how to draw a histogram, you will understand this method). The program implements the
coarsened exact matching (CEM) algorithm, described below. CEM may be used alone
or in combination with any existing matching method. This algorithm, and its statistical
properties, are described in Iacus, King, and Porro (2008)
Laboratorio di Statistica con R
R è un ambiente statistico open source, completo e in costante sviluppo, con la caratteristica di essere un vero e proprio linguaggio di programmazione interfacciabile con altri linguaggi come C, C++, Fortran ma anche Java, SQL ecc.
Il libro affronta i principali argomenti trattati nei corsi istituzionali di Statistica (descrittiva e inferenziale), di Calcolo delle probabilità e della Simulazione. L’approccio didattico mette il lettore in grado di sperimentare immediatamente con R quanto appreso di volta in volta. In questa seconda edizione si è reso necessario allineare il codice usato nel testo alla versione corrente di R e si è anche voluto rispondere alle richieste di approfondimento rispetto alla descrizione degli oggetti R e della gestione dei pacchetti del software.
Il volume è supportato da un sito Web con risorse utilizzabili in ambiente Windows e Macintosh, con l’ultima versione di R disponibile al momento della stampa e il pacchetto contenente tutte le funzioni utilizzate nel testo
Election Forecasting Techniques - Part I
This is the first of two special issues devoted to current topics and innovative approaches in the field of election forecasting techniques. The articles included in these special issues were submitted to the journal after a call for papers was circulated in mid-2013, soliciting contributions that advance the current state of the literature and/or promote novel approaches to political opinion polling, with special emphasis on uses of forecasting techniques of election results.
The articles hosted in the two issues cover topics ranging from exit polls, explanatory statistical models based on structural variables (economic trends, government approval ratings, etc.), prediction markets, social media-based election forecasting, the web as a means to collect data on voting preferences, and measures
of forecast accuracy.
In the first contribution appearing in this issue, titled “Evolving approaches to election forecasting” (the only invited article), Jocelyn Evans examines major approaches to electoral forecasting and discusses their distinctive traits and the constraints which render them variably useful in specific research contexts. He also
addresses the growing use of forecasting tools, stressing the need to adapt techniques originally developed in order to achieve other goals and to not lose track of researchers’ major purpose when employing these techniques, which is to say a greater comprehension of how elections actually work.
As regards prediction markets, an interesting article submitted to the journal is “Accuracy and bias in European prediction markets”, by Sveinung Arnesen and Oliver Strijbis. The paper describes how prediction markets work, specifically the Iowa Electronic Markets (IEM), and provides a meta-analysis of the scores from 62 prediction market vote share contracts for elections in Switzerland, Germany, and Norway. The aim of the paper is to uncover potential biases in forecasting by comparing them with the actual results. The authors show that there is an aggregate bias in the predictions: the actual outcomes tend to have more extreme values than predicted, so that European prediction markets would be biased. Specifically, they show that small-sized vote share contracts tend to be overpredicted, and large-sized vote share contracts tend to be underestimated. The major result reported in this contribution appears to invite researchers to a cautious use of the logarithmic market scoring rule (LMSR) as an automated market maker in vote share markets.
In “Assessing correct voting: A study based on a simulation of municipal elections in Italy”, Giancarlo Gasperoni and Debora Mantovani offer an empirical application of “correct voting” to the Italian political system. The authors estimate correct voting using data collected through the development of an on-line simulation of an Italian election campaign implemented via a “dynamic process-tracing environment”. A typology of voting behaviour is then proposed which combines both correct voting models and the traditional approach distinguishing between political subculture belonging and opinion-based voting among Italian voters. Four multinomial logistic regression models are developed in which the dependent variable is the above-mentioned typology of voting behaviour; the authors use these models to test hypotheses on voting behaviour according to which voters are more likely to vote “correctly” if they express high levels of interest in politics and high degrees of political competence, and if they are “active” seekers of information during the simulated election campaign. Findings show that voters are more likely to vote correctly if they express higher levels of interest in politics, but the effect of political competence is statistically insignificant. Moreover, voters are not more likely to vote correctly if they are “generally active” seekers of flow items concerning candidates’ issues orientations, but they are more likely to vote correctly if they are “specific active” seekers of information concerning their “correct” candidates’ issue orientations.
A further article deals with “Forecasting elections with high volatility”, by Antonio F. Alaminos. In the article, Alaminos proposes the use of a combination of aggregated electoral data from the 1994 German Bundestag elections and the 1998 German Allbus social survey to estimate four probabilistic models of forecasting the German 1998 general elections. The models are built following the logic of Markov chains which, according to the author, make it possible to account for the large electoral volatility observed in the German elections across the 1990s. The forecasts based on the four models perform better than those provided by other techniques, in terms of predicting the winning party and the position of the second and third parties. In addition, the author demostrates that, among the four proposed models, the two corrected models – which assume that there are restrictions to electoral mobility – behave better than the two other pure Markov chain models, which assume that all voters can change their electoral choice.
This first issue of the double-issue set concludes with contributions drawn from a round table discussion dedicated to election forecasting, which took place on February 15, 2013, in Milan during a national conference on “The Value of Statistics for Businesses and Society: Opinion and Market Research” promoted by the Association for Applied Statistics (ASA), the Association for Market, Social, and Opinion Research (ASSIRM), the Italian Statistics Society (SIS), and the Catholic University of the Sacred Heart of Milan
Random Recursive Partitiong and Rank-based proximities for data matching, missing data imputation and nonparametric classification and prediction
Data matching is a typical statistical problem in non experimental and/or observational studies or, more generally, in cross-sectional studies in which one or more data sets are to be compared. Several methods are available in the literature, most of which based on a particular metric or on statistical models, either parametric or nonparametric. We present two methods to calculate a proximity which have the property of being invariant under monotonic transformations. These methods require at most the notion of ordering. We provide an open-source software in the form of a R package.
The software is available at: https://r-forge.r-project.org/projects/rrp/
See also: PORRO G., IACUS S.M (2008). Invariant and metric free proximities for data matching: an R package. JOURNAL OF STATISTICAL SOFTWARE., vol. 25 (11), p. 1-22, ISSN: 1548-766
Special Issues on "Election Forecasting Technqiues (Part II)" - Editorial
Short introduction to co-edited special issue
Matching for causal inference without balance checking
UNIMI - Research Papers in Economics, Business, and Statistics (Statistics, Working Paper 36
Coarsened exact matching
This program is designed to improve the estimation of causal effects via an extremely powerful method of matching that is widely applicable and exceptionally easy to understand and use.
Matching is a nonparametric method of preprocessing data to control for some or all of the potentially confounding influence of pretreatment control variables by reducing imbalance between the treated and control groups. After preprocessing in this way, any method of analysis that would have been used without matching can be applied to estimate causal effects, although some methods will have even better properties. CEM is a Monotonoic Imbalance Bounding (MIB) matching method --- which means that the balance between the treated and control groups is chosen by the user ex ante rather than discovered through the usual laborious process of checking after the fact and repeatedly reestimating, and so that adjusting the imbalance on one variable has no effect on the maximum imbalance of any other. CEM also strictly bounds through ex ante user choice both the degree of model dependence and the average treatment effect estimation error, eliminates the need for a separate procedure to restrict data to common empirical support, meets the congruence principle, is robust to measurement error, works well with multiple imputation methods for missing data, can be completely automated, and is extremely fast computationally even with very large data sets. After preprocessing data with CEM, the analyst may then use a simple difference in means or whatever statistical model they would have applied without matching. CEM also works well for multicategory treatments, determining blocks in experimental designs, and evaluating extreme counterfactuals.
Versions for open source R, Stata and SPSS are available here:
http://gking.harvard.edu/cem
- …
