Search CORE

1,720,995 research outputs found

Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection

Author: Pegoraro Matteo
Beraha Mario
Pegoraro Matteo
Beraha Mario
Publication venue
Publication date: 01/01/2021
Field of study

We address the problem of performing Principal Component Analysis over a family of probability measures on the real line, using the Wasserstein geometry. We present a novel representation of the 2-Wasserstein space, based on a well known isometric bijection and a B-spline expansion. Thanks to this representation, we are able to reinterpret previous work and derive more efficient optimization routines for existing approaches. As shown in our simulations, the solution of these optimization problems can be costly in practice and thus pose a limit to their usage. We propose a novel definition of Principal Component Analysis in the Wasserstein space that, when used in combination with the B-spline representation, yields a straightforward optimization problem that is extremely fast to compute. Through extensive simulation studies, we show how our PCA performs similarly to the ones already proposed in the literature while retaining a much smaller computational cost. We apply our method to a real dataset of mortality rates due to Covid-19 in the US, concluding that our analyses are consistent with the current scientific consensus on the disease

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bayesian Nonparametric Model-based Clustering with Intractable Distributions: An ABC Approach

Author: Beraha Mario
Corradin Riccardo
Publication venue
Publication date: 01/01/2024
Field of study

Bayesian nonparametric mixture models offer a rich framework for model-based clustering. We consider the situation where the kernel of the mixture is available only up to an intractable normalizing constant. In this case, the most commonly used Markov chain Monte Carlo (MCMC) methods are unsuitable. We propose an approximate Bayesian computational (ABC) strategy, whereby we approximate the posterior to avoid the intractability of the kernel. We derive an ABC-MCMC algorithm which combines (i) the use of the predictive distribution induced by the nonparametric prior as proposal and (ii) the use of the Wasserstein distance and its connection to optimal matching problems. To overcome the sensibility concerning the parameters of our algorithm, we further propose an adaptive strategy. We illustrate the use of the proposed algorithm with several simulation studies and an application on real data, where we cluster a population of networks, comparing its performance with standard MCMC algorithms and validating the adaptive strategy

Archivio istituzionale della ricerca - Politecnico di Milano

Normalised latent measure factor models

Author: Griffin Jim E
Beraha Mario
Jim E Griffin
Mario Beraha
Publication venue
Publication date: 01/01/2023
Field of study

We propose a methodology for modelling and comparing probability distributions within a Bayesian nonparametric framework. Building on dependent normalised random measures, we consider a prior distribution for a collection of discrete random measures where each measure is a linear combination of a set of latent measures, interpretable as characteristic traits shared by different distributions, with positive random weights. The model is nonidentified and a method for postprocessing posterior samples to achieve identified inference is developed. This uses Riemannian optimisation to solve a nontrivial optimisation problem over a Lie group of matrices. The effectiveness of our approach is validated on simulated data and in two applications to two real-world data sets: school student test scores and personal incomes in California. Our approach leads to interesting insights for populations and easily interpretable posterior inference

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

Author: Pegoraro Matteo
Matteo Pegoraro
Beraha Mario
Mario Beraha
Publication venue
Publication date: 01/01/2022
Field of study

We present a novel class of projected methods to perform statistical analysis on a data set of probability distributions on the real line, with the 2-Wasserstein metric. We focus in particular on Principal Component Analysis (PCA) and regression. To define these models, we exploit a representation of the Wasserstein space closely related to its weak Riemannian structure by mapping the data to a suitable linear space and using a metric projection operator to constrain the results in the Wasserstein space. By carefully choosing the tangent point, we are able to derive fast empirical methods, exploiting a constrained B-spline approximation. As a byproduct of our approach, we are also able to derive faster routines for previous work on PCA for distributions. By means of simulation studies, we compare our approaches to previously proposed methods, showing that our projected PCA has similar performance for a fraction of the computational cost and that the projected regression is extremely flexible even under misspecification. Several theoretical properties of the models are investigated, and asymptotic consistency is proven. Two real world applications to Covid-19 mortality in the US and wind speed forecasting are discussed

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Spatially dependent mixture models via the logistic multivariate CAR prior

Author: Peli Riccardo
Pegoraro Matteo
Beraha Mario
Guglielmi Alessandra
Publication venue
Publication date: 01/01/2021
Field of study

We consider the problem of spatially dependent areal data, where for each area independent observations are available, and propose to model the density of each area through a finite mixture of Gaussian distributions. The spatial dependence is introduced via a novel joint distribution for a collection of vectors in the simplex, that we term logisticMCAR. We show that salient features of the logisticMCAR distribution can be described analytically, and that a suitable augmentation scheme based on the Pólya-Gamma identity allows to derive an efficient Markov Chain Monte Carlo algorithm. When compared to competitors, our model has proved to better estimate densities in different (disconnected) areal locations when they have different characteristics. We discuss an application on a real dataset of Airbnb listings in the city of Amsterdam, also showing how to easily incorporate for additional covariate information in the model

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

BayesMix: Bayesian Mixture Models in C++

Author: Beraha Mario
Guglielmi Alessandra
Guindani Bruno
Gianella Matteo
Publication venue
Publication date: 01/01/2025
Field of study

We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a self-contained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models. In addition to the several models and MCMC algorithms for posterior inference included in the library, new users with little familiarity on mixture models and the related MCMC algorithms can extend our library with minimal coding effort. Our library is computationally very efficient when compared to competitor software. Examples show that the typical code runtimes are from two to 25 times faster than competitors for data dimension from one to ten. We also provide Python (bayesmixpy) and R (bayesmixr) interfaces. Our library is publicly available on GitHub at https://github.com/bayesmix-dev/bayesmix/

Archivio istituzionale della ricerca - Politecnico di Milano

Journal of Statistical Software

The Semi-Hierarchical Dirichlet Process and Its Application to Clustering Homogeneous Distributions

Author: Alessandra Guglielmi
Fernando A. Quintana
Quintana Fernando A.
Beraha Mario
Guglielmi Alessandra
Mario Beraha
Publication venue
Publication date: 01/01/2021
Field of study

Assessing homogeneity of distributions is an old problem that has received considerable attention, especially in the nonparametric Bayesian literature. To this effect, we propose the semi-hierarchical Dirichlet process, a novel hierarchical prior that extends the hierarchical Dirichlet process of Teh et al. (2006) and that avoids the degeneracy issues of nested processes recently described by Camerlenghi et al. (2019a). We go beyond the simple yes/no answer to the homogeneity question and embed the proposed prior in a random partition model; this procedure allows us to give a more comprehensive response to the above question and in fact find groups of populations that are internally homogeneous when I ≥ 2 such populations are considered. We study theoretical properties of the semi-hierarchical Dirichlet process and of the Bayes factor for the homogeneity test when I = 2. Extensive simulation studies and applications to educational data are also discussed

Pontificia Universidad Católica de Chile: Repositorio UC

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

MCMC Computations for Bayesian Mixture Models Using Repulsive Point Processes

Author: Argiento Raffaele
Alessandra Guglielmi
Beraha Mario
Guglielmi Alessandra
Møller Jesper
Jesper Møller
Mario Beraha
Raffaele Argiento
Publication venue
Publication date: 01/01/2022
Field of study

Repulsive mixture models have recently gained popularity for Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well-separated clusters. The most commonly used methods for posterior inference either require to fix a priori the number of components or are based on reversible jump MCMC computation. We present a general framework for mixture models, when the prior of the "cluster centers" is a finite repulsive point process depending on a hyperparameter, specified by a density which may depend on an intractable normalizing constant. By investigating the posterior characterization of this class of mixture models, we derive a MCMC algorithm which avoids the well-known difficulties associated to reversible jump MCMC computation. In particular, we use an ancillary variable method, which eliminates the problem of having intractable normalizing constants in the Hastings ratio. The ancillary variable method relies on a perfect simulation algorithm, and we demonstrate this is fast because the number of components is typically small. In several simulation studies and an application on sociological data, we illustrate the advantage of our new methodology over existing methods, and we compare the use of a determinantal or a repulsive Gibbs point process prior model. Supplementary files for this article are available online

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A Bayesian model for network flow data: an application to BikeMi trips

Author: BERAHA MARIO
Bissoli G.
Guglielmi A.
Rinaldi G. M.
Principi C.
Publication venue
Publication date: 01/01/2019
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)