1,720,978 research outputs found

    Bayesian dimensionality reduction

    Full text link
    No abstract availableWe are currently witnessing an explosion in the amount of available data. Such growth involves not only the number of data points but also their dimensionality. This poses new challenges to statistical modeling and computations, thus making dimensionality reduction more central than ever. In the present thesis, we provide methodological, computational and theoretical advancements in Bayesian dimensionality reduction via novel structured priors. Namely, we develop a new increasing shrinkage prior and illustrate how it can be employed to discard redundant dimensions in Gaussian factor models. In order to make it usable for larger datasets, we also investigate variational methods for posterior inference under this proposed prior. Beyond traditional models and parameter spaces, we also provide a different take on dimensionality reduction, focusing on community detection in networks. For this purpose, we define a general class of Bayesian nonparametric priors that encompasses existing stochastic block models as special cases and includes promising unexplored options. Our Bayesian approach allows for a natural incorporation of node attributes and facilitates uncertainty quantification as well as model selection

    A new class of nonparametric tests for second-order stochastic dominance based on the Lorenz P–P plot

    Full text link
    Given samples fromtwo non-negative random variables, we propose a family of tests for the null hypothesis that one random variable stochastically dominates the other at the second order. Test statistics are obtained as functionals of the difference between the identity and the Lorenz P–P plot, defined as the composition between the inverse unscaled Lorenz curve of one distribution and the unscaled Lorenz curve of the other. We determine upper bounds for such test statistics under the null hypothesis and derive their limit distribution, to be approximated via bootstrap procedures. We then establish the asymptotic validity of the tests under relatively mild conditions and investigate finite-sample properties through simulations. The results show that our testing approach can be a valid alternative to classic methods based on the difference in the integrals of the cumulative distribution functions, which require bounded support and struggle to detect departures from the null in some cases. The same approach can be extended to a family of fractional-degree stochastic orders, including the first order as a limiting case

    Learning and forecasting of age-specific period mortality via B-spline processes with locally-adaptive dynamic coefficients

    No full text
    Although the analysis of human mortality has a well-established history, the attempt to accurately forecast future death-rate patterns for different age groups and time horizons still attracts active research. Such a predictive focus has motivated an increasing shift toward more flexible representations of age-specific period mortality trajectories at the cost of reduced interpretability. Although this perspective has led to successful predictive strategies, the inclusion of interpretable structures in modeling of human mortality can be, in fact, beneficial for improving forecasts. We pursue this direction via a novel b-spline process with locally-adaptive dynamic coefficients. Such a process outperforms state-of-the-art forecasting strategies by explicitly incorporating the core structures of period mortality within an interpretable formulation which enables inference on age-specific mortality trends and the corresponding rates of change across time. This is obtained by modeling the age-specific death counts via a Poisson log-normal model parameterized through a linear combination of b-spline bases with dynamic coefficients that characterize time changes in mortality rates via suitably defined stochastic differential equations. While flexible, the resulting formulation can be accurately approximated by a Gaussian state-space model that facilitates closed-form Kalman filtering, smoothing and forecasting, for both the trends of the spline coefficients and the corresponding first derivatives, which measure rates of change in mortality for different age groups. As illustrated in applications to mortality data from different countries, the proposed model outperforms state-of-the-art methods, both in point forecasts and in calibration of predictive intervals. Moreover, it unveils substantial differences in mortality patterns across countries and ages, both in the past decades and during the covid-19 pandemic

    Bayesian cumulative shrinkage for infinite factorizations

    Full text link
    The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage priors are useful in such contexts, increasing shrinkage priors can provide a more effective approach that progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, called the cumulative shrinkage process, for the parameters that control the dimension in overcomplete formulations. Our construction has broad applicability and is based on an interpretable sequence of spike-and-slab distributions which assign increasing mass to the spike as the model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages relative to current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the performance gains are outlined in simulations and in an application to personality data

    Concentration of discrepancy-based approximate Bayesian computation via Rademacher complexity

    No full text
    There has been an increasing interest on summary-free solutions for approximate Bayesian computation (abc) that replace distances among summaries with discrepancies between the empirical distributions of the observed data and the synthetic samples generated under the proposed parameter values. The success of these strategies has motivated theoretical studies on the limiting properties of the induced posteriors. However, there is still the lack of a theoretical framework for summary-free abc that (i) is unified, instead of discrepancy-specific, (ii) does not necessarily require to constrain the analysis to data generating processes and statistical models meeting specific regularity conditions, but rather facilitates the derivation of limiting properties that hold uniformly, and (iii) relies on verifiable assumptions that provide more explicit concentration bounds clarifying which factors govern the limiting behavior of the abc posterior. We address this gap via a novel theoretical framework that introduces the concept of Rademacher complexity in the analysis of the limiting properties for discrepancy-based abc posteriors, including in non-i.i.d. and misspecified settings. This yields a unified theory that relies on constructive arguments and provides more informative asymptotic results and uniform concentration bounds, even in those settings not covered by current studies. These key advancements are obtained by relating the asymptotic properties of summary-free abc posteriors to the behavior of the Rademacher complexity associated with the chosen discrepancy within the family of integral probability semimetrics (ips). The ips class extends summary-based distances, and also includes the widely implemented Wasserstein distance and maximum mean discrepancy (mmd), among others. As clarified in specialized theoretical analyses of popular ips discrepancies and via illustrative simulations, this new perspective improves the understanding of summary-free abc

    Bayesian Testing for Exogenous Partition Structures in Stochastic Block Models

    Full text link
    Network data often exhibit block structures characterized by clusters of nodes with similar patterns of edge formation. When such relational data are complemented by additional information on exogenous node partitions, these sources of knowledge are typically included in the model to supervise the cluster assignment mechanism or to improve inference on edge probabilities. Although these solutions are routinely implemented, there is a lack of formal approaches to test if a given external node partition is in line with the endogenous clustering structure encoding stochastic equivalence patterns among the nodes in the network. To fill this gap, we develop a formal Bayesian testing procedure which relies on the calculation of the Bayes factor between a stochastic block model with known grouping structure defined by the exogenous node partition and an infinite relational model that allows the endogenous clustering configurations to be unknown, random and fully revealed by the block-connectivity patterns in the network. A simple Markov chain Monte Carlo method for computing the Bayes factor and quantifying uncertainty in the endogenous groups is proposed. This strategy is evaluated in simulations, and in applications studying brain networks of Alzheimer's patients

    Bayesian Analysis of Privacy Attacks on GPS Trajectories

    Full text link
    The success of applications for sharing GPS trajectories raises serious privacy concerns, in particular about users’ home addresses. In this paper we show that a Bayesian approach is natural and effective for a rigorous analysis of home-identification attacks and their countermeasures, in terms of privacy. We focus on a family of countermeasures named “privacy-region strategies”, consisting in publishing each trajectory from the first exit to the last entrance from/into a privacy region. Their performance is studied through simulations on Brownian motions

    A Spatial Product Partition Model for PM10 Data

    No full text
    This work illustrates a model-based clustering method for analyzing PM10 measurements over time. In particular, we develop a Bayesian dynamic linear model coupled with a spatial product partition model for clustering monitoring stations that exhibit similar persistence and variability of the PM10 concentrations over time. The model integrates spatial information (the locations of the considered monitoring stations) into the clustering process in order to increase the probability that neighboring stations will be assigned to the same cluster. This methodology is applied to the time series of daily PM10 measurements recorded by 110 monitoring stations in Austria. Our analysis reveals three spatially cohesive clusters characterized by different levels of persistence and variability of the PM concentrations. These results may provide helpful insights for understanding air pollution dynamics and support policymakers in identifying intervention areas
    corecore