Journal of Statistical Software
Not a member yet
    1629 research outputs found

    BayesMortalityPlus: A Package in R for Bayesian Mortality Modeling

    Full text link
    The BayesMortalityPlus package provides a framework for modeling and predicting mortality data. The package includes tools for the construction of life tables based on Heligman-Pollard laws, and also on dynamic linear smoothers. Flexibility is available in terms of modeling so that the response variable may be modeled as Poisson, binomial or Gaussian. If temporal data is available, the package provides a Bayesian implementation for the well-known Lee-Carter model that allows for estimation, projection of mortality over time, and assessment of uncertainty of any linear or nonlinear function of parameters such as life expectancy. Illustrations are considered to show the capability of the proposed package to model mortality data

    SMLE: An R Package for Joint Feature Screening in Ultrahigh-Dimensional GLMs

    No full text
    Sparsity-restricted maximum likelihood estimation (SMLE) has received considerable attention for feature screening in ultrahigh-dimensional regression. SMLE is a computationally convenient method that naturally incorporates the joint effects among features in the screening process. We develop a publicly available R package SMLE, which provides a user-friendly environment to carry out the SMLE method in generalized linear models. In particular, the package includes functions to conduct SMLE-screening and the related post-screening selection with popular selection criteria such as AIC and (extended) BIC. The package gives users the flexibility in controlling a series of screening parameters and accommodates both numerical and categorical feature input. The usage of SMLE is illustrated on extensive numerical examples, where the promising performance of the package is well observed

    Benchpress: A Versatile Platform for Structure Learning in Causal and Probabilistic Graphical Models

    No full text
    Describing the relationship between the variables in a study domain and modeling the data generating mechanism is a fundamental problem in many empirical sciences. Probabilistic graphical models are one common approach to tackle the problem. Learning the graphical structure for such models is computationally challenging and a fervent area of current research with a plethora of algorithms being developed. To facilitate the benchmarking of different methods, we present a novel Snakemake workflow, called Benchpress for producing scalable, reproducible, and platform-independent benchmarks of structure learning algorithms for probabilistic graphical models. Benchpress is interfaced via a simple JSON-file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. Benchpress currently provides an interface to a large number of state-of-the-art algorithms from libraries such as BDgraph, BiDAG, bnlearn, causal-learn, gCastle, GOBNILP, pcalg, scikit-learn, TETRAD, and trilearn as well as a variety of methods for data generating models and performance evaluation. Alongside user-defined models and randomly generated datasets, the workflow also includes a number of standard datasets and graphical models from the literature, which may be included in a benchmarking study. We demonstrate the applicability of this workflow for learning Bayesian networks in five typical data scenarios. The source code and documentation is publicly available from https://benchpressdocs.readthedocs.io/

    dbnR: Gaussian Dynamic Bayesian Network Learning and Inference in R

    No full text
    Dynamic Bayesian networks are a type of multivariate time series forecasting model capable of a level of interpretability thanks to their graphical representation. They have been reported extensively in the literature in a variety of areas, but their application has usually involved an ad hoc implementation or adaptation of existing Bayesian network software to a dynamic case. In this paper, we present dbnR, an R package that encapsulates the whole process of learning the model and parameters from data and performing inference. The package provides three different structure learning algorithms, exact and approximate inference and a visualization tool that allows inspection of the graphical structure of the networks. The aim of dbnR is to provide a tool that enables fast deployment of dynamic Bayesian network models and to make them readily available as general purpose forecasting models

    MixtureMissing: An R Package for Robust and Flexible Model-Based Clustering with Incomplete Data

    No full text
    The R package MixtureMissing performs model-based clustering on data sets with values missing at random, aiming to identify homogeneous groups of observations. In model-based clustering, the data within each cluster follow a specific distribution. In the package, 13 distributions are available, including the contaminated normal distribution, the generalized hyperbolic distribution (GHD), and 11 special or limiting cases of GHD. Notably, eight out of these 11 cases have not been formulated at the time of writing. Given a list of candidate distributions, the package can recommend the optimal distribution to employ based on a specified information criterion. In this paper, the methodological foundations and computational aspects of the package are discussed. Furthermore, important features of model fitting, model summary, and available visualization tools are thoroughly illustrated using real data sets

    Split-Apply-Combine with Dynamic Grouping

    Full text link
    Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-combine types of operations. This paper presents the R package accumulate that offers convenient interfaces for defining grouped aggregation where the grouping itself is dynamically determined, based on user-defined conditions on subsets, and a user-defined subset collapsing scheme. The formal underlying algorithm is described and analyzed as well

    Learning Permutation Symmetry of a Gaussian Vector with gips in R

    Full text link
    The study of hidden structures in data presents challenges in modern statistics and machine learning. We introduce the gips package in R, which identifies permutation subgroup symmetries in Gaussian vectors. gips serves two main purposes: Exploratory analysis in discovering hidden permutation symmetries and estimating the covariance matrix under permutation symmetry. It is competitive to canonical methods in dimensionality reduction while providing a new interpretation of the results. gips implements a novel Bayesian model selection procedure within Gaussian vectors invariant under the permutation subgroup introduced in Graczyk, Ishi, Kołodziejek, and Massam (2022b, The Annals of Statistics)

    Optimum Allocation for Adaptive Multi-Wave Sampling in R: The R Package optimall

    Full text link
    The R package optimall offers a collection of functions that efficiently streamline the design process of sampling in surveys ranging from simple to complex. The package's main functions allow users to interactively define and adjust strata cut points based on values or quantiles of auxiliary covariates, adaptively calculate the optimum number of samples to allocate to each stratum using Neyman or Wright allocation, and select specific units to sample based on a stratified sampling design. Using real-life epidemiological study examples, we demonstrate how optimall facilitates an efficient workflow for the design and implementation of surveys in R. Although tailored towards multi-wave sampling under two- or three-phase designs, the R package optimall may be useful for any sampling survey

    scpi: Uncertainty Quantification for Synthetic Control Methods

    Full text link
    The synthetic control method offers a way to quantify the effect of an intervention using weighted averages of untreated units to approximate the counterfactual outcome that the treated unit(s) would have experienced in the absence of the intervention. This method is useful for program evaluation and causal inference in observational studies. We introduce the software package scpi for prediction and inference using synthetic controls, implemented in Python, R, and Stata. For point estimation or prediction of treatment effects, the package offers an array of (possibly penalized) approaches leveraging the latest optimization methods. For uncertainty quantification, the package offers the prediction interval methods introduced by Cattaneo, Feng, and Titiunik (2021) and Cattaneo, Feng, Palomba, and Titiunik (2025b). The paper includes numerical illustrations and a comparison with other synthetic control software

    SURVEYHLM: A SAS Macro for Multilevel Analysis with Large-Scale Educational Assessment Data

    Full text link
    Special techniques must be considered during analysis of large-scale educational assessment (LSA) data. In this regard, many software packages are available to support researchers conducting secondary analyses. However, the software packages available for multilevel analyses are somewhat limited and usually contain only a few of the required techniques. In this article, we review the technical details of LSA studies and describe our comparison of software for multilevel analyses by questioning the extent to which these packages take these technical details into account. In accordance with our findings from this comparison, we developed a SAS macro for multilevel analyses of LSA data that meets all technical requirements. The macro SURVEYHLM fits multilevel models with LSA datasets. SURVEYHLM can handle up to three levels. It can fit different correlation structures for the random components and use plausible values as response variables, and the responses do not necessarily need to be normally distributed. Weights can be specified on levels 1, 2 and 3. Scaling of the level-specific weights is possible, and standard errors can be based on a sandwich estimator or calculated with either the jackknife replication technique or through user-supplied replication weights. Examples of applications are given

    1,551

    full texts

    1,629

    metadata records
    Updated in last 30 days.
    Journal of Statistical Software
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇