1,721,005 research outputs found
Scalable and accurate variational Bayes for high-dimensional binary regression models
Modern methods for Bayesian regression beyond the Gaussian response setting are often computationally impractical or inaccurate in high dimensions. In fact, as discussed in recent literature, bypassing such a trade-off is still an open problem even in routine binary regression models, and there is limited theory on the quality of variational approximations in high-dimensional settings. To address this gap, we study the approximation accuracy of routinely used mean-field variational Bayes solutions in high-dimensional probit regression with Gaussian priors, obtaining novel and practically relevant results on the pathological behaviour of such strategies in uncertainty quantification, point estimation and prediction. Motivated by these results, we further develop a new partially factorized variational approximation for the posterior distribution of the probit coefficients that leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting approximation belongs to a tractable class of unified skew-normal densities that crucially incorporates skewness and, unlike for state-of-the-art mean-field solutions, converges to the exact posterior density as p → ∞. To solve the variational optimization problem, we derive a tractable coordinate ascent variational inference algorithm that easily scales to p in the tens of thousands, and provably requires a number of iterations converging to 1 as p → ∞. Such findings are also illustrated in extensive empirical studies where our novel solution is shown to improve the approximation accuracy of mean-field variational Bayes for any n and p, with the magnitude of these gains being remarkable in those high-dimensional p>n settings where state-of-the-art methods are computationally impractical
Demographic characteristics associated with West Nile virus neuroinvasive disease – A retrospective study on the wider European area 2006–2021
Background With a case-fatality-risk ranging from 3.0 to >20.0% and life-long sequelae, West Nile neuroinvasive disease (WNND) is the most dangerous outcome of West Nile virus (WNV) infection in humans. As no specific prophylaxis nor therapy is available for these infections, focus is on preventive strategies. We aimed to find variables associated with WNND diagnosis, hospitalisation or death, to identify high-risk sub-groups of the population, on whom to concentrate these strategies. Methods We used data from The European Surveillance System–TESSy, provided by National Public Health Authorities, and released by the European Centre for Disease Prevention and Control (ECDC). In two Firth-penalised logistic regression models, we considered age, sex, clinical criteria, epidemiological link to other cases (epi-link), calendar year, and season as potential associated variables. In one model we considered also the rural/urban classification of the place of infection (RUC), while in the other the specific reporting country. Findings Among confirmed West Nile Virus cases, 2,916 WNND cases were registered, of which 2,081 (71.4%), and 383 (13.1%) resulted in the hospitalisation and death of the patient, respectively. Calendar year, RUC/country, age, sex, clinical criteria, and epi-link were associated with WNND diagnosis. Hospitalisation was associated with calendar year and RUC/ country; whereas death was associated with age, sex and country. Interpretation Our results support previous findings on WNND associated variables (most notably age and sex); while by observing the whole population of WNND cases in the considered area and period, they also allow for stronger generalizations, conversely to the majority of previous studies, which used sample populations
Bayesian Conjugacy in Probit, Tobit, Multinomial Probit and Extensions: A Review and New Results
A broad class of models that routinely appear in several fields can be expressed as partially or fully discretized Gaussian linear regressions. Besides including classical Gaussian response settings, this class also encompasses probit, multinomial probit and tobit regression, among others, thereby yielding one of the most widely-implemented families of models in routine applications. The relevance of such representations has stimulated decades of research in the Bayesian field, mostly motivated by the fact that, unlike for Gaussian linear regression, the posterior distribution induced by such models does not seem to belong to a known class, under the commonly assumed Gaussian priors for the coefficients. This has motivated several solutions for posterior inference relying either on sampling-based strategies or on deterministic approximations that, however, still experience computational and accuracy issues, especially in high dimensions. The scope of this article is to review, unify and extend recent advances in Bayesian inference and computation for this core class of models. To address such a goal, we prove that the likelihoods induced by these formulations share a common analytical structure implying conjugacy with a broad class of distributions, namely the unified skew-normal (SUN), that generalize Gaussians to include skewness. This result unifies and extends recent conjugacy properties for specific models within the class analyzed, and opens new avenues for improved posterior inference, under a broader class of formulations and priors, via novel closed-form expressions, iid samplers from the exact SUN posteriors, and more accurate and scalable approximations from variational Bayes and expectation-propagation. Such advantages are illustrated in simulations and are expected to facilitate the routine-use of these core Bayesian models, while providing novel frameworks for studying theoretical properties and developing future extensions. Supplementary materials for this article are available online
Advances in Bayesian Inference for Binary and Categorical Data
No abstract availableBayesian binary probit regression and its extensions to time-dependent observations and multi-class responses are popular tools in binary and categorical data regression due to their high interpretability and non-restrictive assumptions.
Although the theory is well established in the frequentist literature, such models still face a florid research in the Bayesian framework.This is mostly due to the fact that state-of-the-art methods for Bayesian inference in such settings are either computationally impractical or inaccurate in high dimensions and in many cases a closed-form expression for the posterior distribution of the model parameters is, apparently, lacking.The development of improved computational methods and theoretical results to perform inference with this vast class of models is then of utmost importance.
In order to overcome the above-mentioned computational issues, we develop a novel variational approximation for the posterior of the coefficients in high-dimensional probit regression with binary responses and Gaussian priors, resulting in a unified skew-normal (SUN) approximating distribution that converges to the exact posterior as the number of predictors p increases.
Moreover, we show that closed-form expressions are actually available for posterior distributions arising from models that account for correlated binary time-series and multi-class responses.
In the former case, we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of SUN distributions whose parameters can be updated recursively in time via analytical expressions, allowing to develop an i.i.d. sampler together with an optimal sequential Monte Carlo procedure.
As for the latter case, i.e. multi-class probit models, we show that many different formulations developed in the literature in separate ways admit a unified view and a closed-form SUN posterior distribution under a SUN prior distribution (thus including the Gaussian case).
This allows to implement computational methods which outperform state-of-the-art routines in high-dimensional settings by leveraging SUN properties and the variational methods introduced for the binary probit.
Finally, motivated also by the possible linkage of some of the above-mentioned models to the Bayesian nonparametrics literature, a novel species-sampling model for partially-exchangeable observations is introduced, with the double goal of both predicting the class (or species) of the future observations and testing for homogeneity among the different available populations.
Such model arises from a combination of Pitman-Yor processes and leverages on the appealing features of both hierarchical and nested structures developed in the Bayesian nonparametrics literature.
Posterior inference is feasible thanks to the implementation of a marginal Gibbs sampler, whose pseudo-code is given in full detail
Multivariate Gaussian cumulative distribution functions as the marginal likelihood of their dual Bayesian probit models
A class of conjugate priors for multinomial probit models which includes the multivariate normal one
Multinomial probit models are routinely-implemented representations for learning how the class probabilities of categorical response data change with p observed predictors. Although several frequentist methods have been developed for estimation, inference and classification within such a class of models, Bayesian inference is still lagging behind. This is due to the apparent absence of a tractable class of conjugate priors, that may facilitate posterior inference on the multinomial probit coefficients. Such an issue has motivated increasing efforts toward the development of effective Markov chain Monte Carlo methods, but state-of-the-art solutions still face severe computational bottlenecks, especially in high dimensions. In this article, we show that the entire class of unified skew-normal (SUN) distributions is conjugate to several multinomial probit models. Leveraging this result and the SUN properties, we improve upon state-of-the-art solutions for posterior inference and classification both in terms of closed-form results for several functionals of interest, and also by developing novel computational methods relying either on independent and identically distributed samples from the exact posterior or on scalable and accurate variational approximations based on blocked partially-factorized representations. As illustrated in simulations and in a gastrointestinal lesions application, the magnitude of the improvements relative to current methods is particularly evident, in practice, when the focus is on high-dimensional studies
Effects of climatic and environmental factors on mosquito population inferred from West Nile virus surveillance in Greece
Mosquito-borne diseases’ impact on human health is among the most prominent of all communicable diseases. With limited pool of tools to contrast these diseases, public health focus remains preventing mosquito-human contacts. Applying a hierarchical spatio-temporal Bayesian model on West Nile virus (WNV) surveillance data from Greece, we aimed to investigate the impact of climatic and environmental factors on Culex mosquitoes’ population. Our spatio-temporal analysis confirmed climatic factors as major drivers of WNV-transmitting-Culex mosquitoes population dynamics, with temperature and long periods of moderate-to-warm climate having the strongest positive effect on mosquito abundance. Conversely, rainfall, high humidity, and wind showed a negative impact. The results suggest the presence of statistically significant differences in the effect of regional and seasonal characteristics, highlighting the complex interplay between climatic, geographical and environmental factors in the dynamics of mosquito populations. This study may represent a relevant tool to inform public health policymakers in planning preventive measures
A closed-form filter for binary time series
Non-Gaussian state-space models arise in several applications, and within this framework the binary time series setting provides a relevant example. However, unlike for Gaussian state-space models — where filtering, predictive and smoothing distributions are available in closed form — binary state-space models require approximations or sequential Monte Carlo strategies for inference and prediction. This is due to the apparent absence of conjugacy between the Gaussian states and the likelihood induced by the observation equation for the binary data. In this article we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of unified skew-normals (SUN) whose parameters can be updated recursively in time via analytical expressions. Also the key functionals of these distributions are, in principle, available, but their calculation requires the evaluation of multivariate Gaussian cumulative distribution functions. Leveraging SUN properties, we address this issue via novel Monte Carlo methods based on independent samples from the smoothing distribution, that can easily be adapted to the filtering and predictive case, thus improving state-of-the-art approximate and sequential Monte Carlo inference in small-to-moderate dimensional studies. Novel sequential Monte Carlo procedures that exploit the SUN properties are also developed to deal with online inference in high dimensions. Performance gains over competitors are outlined in a financial application
Host selection and forage ratio in West Nile virus–transmitting Culex mosquitoes: Challenges and knowledge gaps
To date, no specific therapy or vaccination is available for West Nile virus (WNV) infections in humans; preventive strategies represent the only possibility to control transmission. To focus these strategies, detailed knowledge of the virus dynamics is of paramount impor-tance. However, several aspects of WNV transmission are still unclear, especially regarding the role of potential vertebrate host species. Whereas mosquitoes’ intrinsic characteristics cause them to favour certain hosts (host preference), absolute selection is impossible in natural settings. Conversely, the selection carried out among available hosts and influenced from hosts’ availability and other ecologi-cal/environmental factors is defined as host selection. Methodology/Principal findings In July 2022, we searched PubMed database for original articles exploring host selection among WNV-transmitting Culex mosquitoes, the main WNV vector. We considered only original field studies estimating and reporting forage ratio. This index results from the ratio between the proportion of blood meals taken by mosquitoes on potential host species and the hosts’ relative abundance. From the originally retrieved 585 articles, 9 matched the inclusion criteria and were included in this review. All but one of the included studies were conducted in the Americas, six in the United States, and one each in Mexico and Colombia. The remaining study was conducted in Italy. American Robin, Northern Cardinal, and House Finch were the most significantly pre-ferred birds in the Americas, Common Blackbird in Italy. Conclusions/Significance Although ornithophilic, all observed WNV-transmitting mosquitoes presented opportunistic feeding behaviour. All the observed species showed potential to act as bridges for zoonotic diseases, feeding also on humans. All the observed mosquitoes presented host selection patterns and did not feed on hosts as expected by chance alone. The articles observe different species of mosquitoes in different environments. In addi-tion, the way the relative host abundance was determined differed. Finally, this review is not systematic. Therefore, the translation of our results to different settings should be conducted cautiously
An epidemiological model for mosquito host selection and temperature-dependent transmission of West Nile virus
We extend a previously developed epidemiological model for West Nile virus (WNV) infection in humans in Greece, employing laboratory-confirmed WNV cases and mosquito-specific characteristics of transmission, such as host selection and temperature-dependent transmission of the virus. Host selection was defined by bird host selection and human host selection, the latter accounting only for the fraction of humans that develop symptoms after the virus is acquired. To model the role of temperature on virus transmission, we considered five temperature intervals (≤ 19.25 °C; > 19.25 and < 21.75 °C; ≥ 21.75 and < 24.25 °C; ≥ 24.25 and < 26.75 °C; and > 26.75 °C). The capacity of the new model to fit human cases and the week of first case occurrence was compared with the original model and showed improved performance. The model was also used to infer further quantities of interest, such as the force of infection for different temperatures as well as mosquito and bird abundances. Our results indicate that the inclusion of mosquito-specific characteristics in epidemiological models of mosquito-borne diseases leads to improved modelling capacity
- …
