Journal of Statistical Software
Not a member yet
1629 research outputs found
Sort by
hmmTMB: Hidden Markov Models with Flexible Covariate Effects in R
Hidden Markov models (HMMs) are widely applied in studies where a discrete-valued process of interest is observed indirectly. They have for example been used to model behavior from human and animal tracking data, disease status from medical data, and financial market volatility from stock prices. The model has two main sets of parameters: transition probabilities, which drive the latent state process, and observation parameters, which characterize the state-dependent distributions of observed variables. One particularly useful extension of HMMs is the inclusion of covariates on those parameters, to investigate the drivers of state transitions or to implement Markov-switching regression models. We present the new R package hmmTMB for HMM analyses, with flexible covariate models in both the hidden state and observation parameters. In particular, non-linear effects are implemented using penalized splines, including multiple univariate and multivariate splines, with automatic smoothness selection. The package allows for various random effect formulations (including random intercepts and slopes), to capture between-group heterogeneity. hmmTMB can be applied to multivariate observations, and it accommodates various types of response data, including continuous (bounded or not), discrete, and binary variables. Parameter constraints can be used to implement non-standard dependence structures, such as semi-Markov, higher-order Markov, and autoregressive models. Here, we summarize the relevant statistical methodology, we describe the structure of the package, and we present an example analysis of animal tracking data to showcase the workflow of the package
Quantile Regression under Limited Dependent Variable in Stata
This article develops a Stata command, ldvqreg, to estimate quantile regression models for the cases of censored (with lower and/or upper censoring) and binary dependent variables. The estimator is implemented using a smoothed version of the quantile regression objective function. Simulation exercises show that it correctly estimates the parameters and it should be implemented instead of the available quantile regression methods when censoring is present. Different empirical applications illustrate these methods
sdmTMB: An R Package for Fast, Flexible, and User-Friendly Generalized Linear Mixed Effects Models with Spatial and Spatiotemporal Random Fields
Geostatistical spatial or spatiotemporal data are common across scientific fields. However, appropriate models to analyze these data, such as generalized linear mixed effects models (GLMMs) with Gaussian Markov random fields (GMRFs), are computationally intensive and challenging for many users to implement. Here, we introduce the R package sdmTMB, which extends the flexible interface familiar to users of lme4, glmmTMB, and mgcv to include spatial and spatiotemporal latent GMRFs using the stochastic partial differential equation (SPDE) approach. SPDE matrices are constructed with fmesher, and estimation is conducted via maximum marginal likelihood with TMB or via Bayesian inference with tmbstan and rstan. We describe the model and explore case studies that illustrate sdmTMB's flexibility in implementing penalized smoothers, non-stationary processes (time-varying and spatially varying coefficients), hurdle models, cross-validation, and anisotropy (directionally dependent spatial correlation). Finally, we compare the functionality, speed, and interfaces of related software, demonstrating that sdmTMB can be an order of magnitude faster than R-INLA. We hope sdmTMB will help open this useful class of models to more geostatistical analysts
skewlmm: An R Package for Fitting Skewed and Heavy-Tailed Linear Mixed Models
Longitudinal data are commonly analyzed using linear mixed models, which, for mathematical convenience, usually assume that both random effect and error follow normal distributions. However, these restrictive assumptions may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences. Schumacher, Lachos, and Matos (2021) developed a flexible extension of linear mixed models considering the scale mixture of skew-normal class of distributions from a frequentist point of view, accommodating skewness and heavy tails, and the robust model formulation accounts for a possible within-subject serial dependence by considering some useful dependence structures. This paper presents the R package skewlmm, which implements the method proposed by Schumacher et al. (2021) and provides a user-friendly tool to fit robust linear mixed models to longitudinal data, including model-fit tests, residual analyzes, and plot functions to support model selection and evaluation. Two data sets and a synthetic example are analyzed to illustrate the methodology and software implementation
dynamite: An R Package for Dynamic Multivariate Panel Models
dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of both individual-level and aggregated predictions and offers a comprehensive suite of tools for visualization and model diagnostics
equateMultiple: An R Package to Equate Multiple Forms
Item response theory (IRT) provides a framework for modeling the responses given to a test or questionnaire, which are assumed to depend on an underlying latent variable and on some item parameters. Due to identifiability issues, when the parameters are estimated separately on different datasets, the estimates of the item parameters and the predicted values of the latent variable are not directly comparable. Equating is a statistical procedure that can be used to convert these values to a common metric and to obtain comparable test scores. The R package equateMultiple implements methods to link the parameters estimated on many different datasets. After briefly reviewing the IRT models and the equating methods, this article illustrates the use of the package
singleRcapture: An R Package for Single-Source Capture-Recapture Models
Population size estimation is a major challenge in official statistics, social sciences, and natural sciences. The problem can be tackled by applying capture-recapture methods, which vary depending on the number of sources used, particularly on whether a single or multiple sources are involved. This paper focuses on the first group of methods and introduces a novel R package: singleRcapture. The package implements state-of-the-art single-source capture-recapture (SSCR) models (e.g., zero-truncated one-inflated regression) together with new developments proposed by the authors, and provides a user-friendly application programming interface (API). This self-contained package can be used to produce point estimates and their variance and implements several bootstrap variance estimators or diagnostics to assess quality and conduct sensitivity analysis. It is intended for users interested in estimating the size of populations, particularly those that are difficult to reach or measure, for which information is available only from one source and dual/multiple system estimation is not applicable. Our package serves to bridge a significant gap, as the SSCR methods are either not available at all or are only partially implemented in existing R packages and other open-source software
mdendro: An R Package for Extended Agglomerative Hierarchical Clustering
mdendro is an R package that provides a comprehensive collection of linkage methods for agglomerative hierarchical clustering on a matrix of proximity data (distances or similarities), returning a multifurcated dendrogram or multidendrogram. Multidendrograms can group more than two clusters at the same time, solving the nonuniqueness problem that arises when there are ties in the data. This problem causes that different binary dendrograms are possible depending both on the order of the input data and on the criterion used to break ties. Weighted and unweighted versions of the most common linkage methods are included in the package, which also implements two parametric linkage methods. In addition, package mdendro provides five descriptive measures to analyze the resulting dendrograms: cophenetic correlation coefficient, space distortion ratio, agglomerative coefficient, chaining coefficient and tree balance
Parsimoniously Fitting Large Multivariate Random Effects in glmmTMB
Multivariate random effects with unstructured variance-covariance matrices of large dimensions, q, can be a major challenge to estimate. In this paper, we introduce a new implementation of a reduced-rank approach to fit large dimensional multivariate random effects by writing them as a linear combination of d < q latent variables. By adding reduced-rank functionality to the package glmmTMB, we enhance the mixed models available to include random effects of dimensions that were previously not possible. We apply the reduced-rank random effect to two examples, estimating a generalized latent variable model for multivariate abundance data and a random-slopes model
Exploring Data Subsets with vtree
Variable trees are a new method for the exploration of discrete multivariate data. They display nested subsets and corresponding frequencies and percentages. Manual calculation of these quantities can be laborious, especially when there are many multi-level factors and missing data. Here we introduce variable trees and their implementation in the vtree R package, draw comparisons with existing methods (contingency tables, mosaic plots, Venn/Euler diagrams, and UpSet), and illustrate their utility using two case studies. Variable trees can be used to (1) reveal patterns in nested subsets, (2) explore missing data, and (3) generate study flow diagrams (e.g., CONSORT diagrams) directly from data frames, to support reproducible research and open science