1,721,065 research outputs found
Three Essays on the Conditional Inference Approach for Binary Panel Data Models
La tesi consiste in una collezione di tre diversi saggi relativi all’approccio di Inferenza Condizionale applicata alla stima di modelli per dati panel binari ad effetti fissi. Il lavoro è organizzato in tre capitoli. Una dettagliata rassegna della letteratura concernente le principali proposte teoriche riguardanti la stima dei modelli sopra citati è riportata nel Capitolo 1, dove gli stimatori basati sull’approccio di Inferenza Condizionale e quelli basati sulla “Correzione del Bias” sono descritti e dove correttezza in campioni finiti è valutata attraverso una simulazione di Monte Carlo. Il secondo ed il terzo capitolo contengono proposte su problemi di applicazione degli stimatori di Inferenza
Condizionale: (i) il problema di testare meccanismi endogeni di autoselezione del campione e (ii) il costo computazionale delle funzioni di verosimiglianza condizionali impiegate nella stima dei parametri. La risposta alla prima questione è affrontata attraverso l’approssimazione di un modello logit ad effetti fissi, stimato attraverso una procedura a due stadi, che ammette un semplice test di azzeramento. Il test consente di identificare la potenziale endogeneità del meccanismo di selezione e di isolare per costruzione il ruolo delle componenti inosservate costanti nel tempo. Il test è applicato ai dati SHARE su un problema relativo all’impatto del pensionamento sullo stato di salute. Infine, il problema computazionale si presenta quando il panel è caratterizzato da un numero moderatamente grande di osservazioni nel tempo (T), ciò rende il calcolo della funzione di verosimiglianza impossibile attraverso le consuete tecniche di algebra matriciale. Questo lavoro propone un nuovo metodo ricorsivo per il calcolo delle stesse, relativamente ad una classe di modelli dinamici come il modello Quadratic Expoential. Una simulazione di Monte Carlo mostra come il limite computazionale sia stato rimosso attraverso l’utilizzo dell’algoritmo.This thesis is a collection of three essays concerning the Conditional Inference approach applied to the estimation of binary panel data models with fixed effects. The work is organised in three chapters. A detailed literature review of the main theoretical proposals about the estimation of the aforementioned models is reported in Chapter 1, where the Conditional Inference estimators and the “Bias-Corrected” estimators are described and whose finite sample performance is evaluated by a Monte Carlo experiment. The second and the third chapters focus on real data problem affecting the Conditional Inference estimators: (i) the problem of testing for endogenous self- and sample-selection mechanisms and (ii) the computational burden of the conditional likelihood function involved in the parameters estimation procedure. The first issue is dealt with a methodology that relies on an approximation of a fixed-effects logit model estimated by conditional maximum likelihood in a two-step procedure and that admits a very simple variable-addition test. The test is able to identify the idiosyncratic endogeneity since the choice of the Conditional Inference approach allows to handle heterogeneity endogeneity and to overcome the incidental parameters problem at the same time. The test is applied on SHARE data to a problem concerning health and retirement. Finally, Conditional Inference estimators require the maximisation of peculiar likelihood functions, whose computational burden limits the applicability of these techniques when the number of time occasions (T) in the panel becomes large, so that the parameters estimation is no longer feasible when the likelihood function is computed by standard algebra operations. This work proposes a novel way to recursively compute the conditional likelihood function of dynamic models, focusing on the Quadratic Exponential model. A Monte Carlo simulation shows how the recursive algorithm removes the computational burden due to large-T
Testing for positive association in contingency tables with fixed margins
An exact conditional approach is developed to test for certain forms of positive association
between two ordinal variables (e.g. positive quadrant dependence, total positivity of order 2).
The approach is based on the use of a test statistic measuring the goodness-of-(t of the model
formulated according to the type of positive association of interest. The nuisance parameters, corresponding
to the marginal distributions of the two variables, are eliminated by conditioning the
inference on the observed margins. This, in turn, allows to remove the uncertainty on the conclusion
of the test, which typically arises in the unconditional context where the null distribution of
the test statistic depends on such parameters. Since the multivariate generalized hypergeometric
distribution, which results from conditioning, is normally intractable, Markov chain Monte Carlo
methods are used to obtain maximum likelihood estimates of the parameters of the constrained
model. The Pearson’s chi-squared statistics is used as a test statistic; a p-value forthis statistic
is computed through simulation, when the data are sparse, or exploiting the asymptotic theory
based on the chi-bar squared distribution. The extension of the present approach to deal with
bivariate contingency tables, strati(ed according to one or more explanatory discrete variables,
is also outlined. Finally, three applications based on real data are presented
The multilevel latent Markov model
We introduce a multilevel version of the latent Markov model with
covariates which is suitable for the analysis of binary longitudinal data when subjects
are grouped in a large number of clusters. For the maximum likelihood estimation
of this model we introduce an EM algorithm which can be implemented
by means of certain recursions well known in the hidden Markov literature. The
approach is illustrated through the application to a dataset deriving from the
administration of a set of items to a sample of patients suffering from cancer who
were admitted to different hospitals
The use of mixtures for dealing with non-normal regression errors
In many situations, the distribution of the error terms of a linear regression model departs significantly from normality. It is shown, through a simulation study, that an effective strategy
to deal with these situations is fitting a regression model based on the assumption that the error terms follow a mixture of normal distributions. The main advantage, with respect to the usual approach based on the least-squares method is a greater precision of the parameter estimates and confidence intervals. For the parameter estimation we make use of the EM algorithm, while confidence intervals are constructed through a bootstrap method
Information matrix for hidden Markov models with covariates
For a general class of hidden Markov models that may include time-varying covariates, we illustrate how to compute the observed information matrix, which may be used to obtain standard errors for the parameter estimates and check model identifiability. The proposed method is based on the Oakes’ identity and, as such, it allows for the exact computation of the information matrix on the basis of the output of the expectation-maximization (EM) algorithm for maximum likelihood estimation. In addition to this output, the method requires the first derivative of the posterior probabilities computed by the forward-backward recursions introduced by Baum and Welch. Alternative methods for computing exactly the observed information matrix require, instead, to differentiate twice the forward recursion used to compute the model likelihood, with a greater additional effort with respect to the EM algorithm. The proposed method is illustrated by a series of simulations and an application based on a longitudinal dataset in Health Economics
A discrete time event-history approach to informative drop-out in mixed latent Markov models with covariates
Mixed latent Markov (MLM) models represent an important tool of analysis of longitudinal data when response
variables are affected by time-fixed and time-varying unobserved heterogeneity, in which the latter is accounted for by a hidden
Markov chain. In order to avoid bias when using a model of this type in the presence of informative drop-out, we propose an
event-history (EH) extension of the latent Markov approach that may be used with multivariate longitudinal data, in which
one or more outcomes of a different nature are observed at each time occasion. The EH component of the resulting model is
referred to the interval-censored drop-out, and bias in MLM modeling is avoided by correlated random effects, included in the
different model components, which follow common latent distributions. In order to perform maximum likelihood estimation of
the proposed model by the expectation-maximization algorithm, we extend the usual forward-backward recursions of Baum and Welch. The algorithm has the same complexity as the one adopted in cases of non-informative drop-out. We illustrate
the proposed approach through simulations and an application based on data coming from a medical study about primary
biliary cirrhosis in which there are two outcomes of interest, one continuous and the other binary
Exact Conditional Testing of Certain Forms of Positive Association for Bivariate Ordinal Data
We describe an exact conditional approach to test for certain forms of positive association
between two ordinal variables. The approach is based on maximizing a conditional version of the multinomial
likelihood for the observed table given the row and column margins. This allows us to remove
the uncertainty that typically arises in testing hypotheses on the association between two categorical
variables due to the presence of nuisance parameters corresponding to the marginal distributions of
the two variables. Conditional maximum likelihood estimates of the parameters are obtained through
Markov chain Monte Carlo methods. The Pearson’s chi-squared is used as test statistic. A p-value for
this statistic is computed by simulation, when data are sparse, or by exploiting the asymptotic theory
Marginal models and pruning of association rules
Association rules are a well established tool in data mining software which are nowadays used to describe statistical associations in many fields. Classical association rules (also called boolean rules), which have been introduced in the context of market basket analysis by Agrawal, Imielinski and Swami, are statements about the fact that the presence of a subset of items called "antecedent" is likely to imply the presence of another set of items called "consequent". In market basket analysis, for instances, there will be a transaction (that is, a nonempty set of items) for each customer (actually for each single bill), each of which consisting of a selection from the set of the K products (items) present in the store.
To reduce the mass of discovered rules to a manageable number of patterns, a number of selection and pruning methods have been proposed. The use of statistical measures and statistical tests have also been advanced to asses the "interestingness" of an association rule. Here, to test the "interestingness" of a rules, or of a set of rules, we outline how recent developments in the analysis of frequency data, in particular on the theory of marginal models, can be applied to this context.
Marginal models are a rather recent extension of log-linear models intended to analyze simultaneously several marginal distributions of interest. As such, this is an approach particularly suitable for investigating association rules where we are mostly interested in low dimensional marginal distributions in view of the fact that they provide a simple way of summarizing the most tangible and easily accessible structures in the data. In the following we outline this general approach and indicate how it could be applied to solve a few specific problems related to pruning of association rules
Latent Markov models for evaluating nursing home performances
We illustrate how latent Markov models may be used for the analysis of a longitudinal
dataset coming from the administration of a set of dichotomously-scored items to a sample of elderly
people admitted in different nursing homes. These models are aimed at describing individual changes
in terms of quality of life, in particular considering: (i) how it changes over the time and (ii) how it
depends on belonging to different nursing homes. For the maximum likelihood estimation we apply an
EM algorithm which is implemented by means of certain recursions taken from the literature on hidden
Markov models
- …
