1,720,997 research outputs found
Effective sample size for a mixture prior
Mixture prior distributions are much used in statistical applications, such as clinical trials, especially to avoid prior-data conflicts. We explicitly prove that the effective sample size (ESS) of a mixture prior rarely exceeds the ESS of any individual mixture component density of the prior
A Bayesian quest for finding a unified model for predicting volleyball games
Volleyball is a team sport with unique and specific characteristics. We introduce a new two-level hierarchical Bayesian model which accounts for these volleyball-specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the losing team. An additional Poisson-distributed inflation component is introduced to model the extra points played in the case that the two teams have a point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team-specific abilities and the home effect are used as covariates on all layers of the model (set, point and extra inflated points). The implementation of the proposed model on the Italian SuperLega 2017–2018 data shows exceptional reproducibility of the final league table and satisfactory predictive ability
Lightweight merging of compressed indices based on BWT variants
In this paper we propose a flexible and lightweight technique for merging
compressed indices based on variants of Burrows-Wheeler transform (BWT), thus
addressing the need for algorithms that compute compressed indices over large
collections using a limited amount of working memory. Merge procedures make
it possible to use an incremental strategy for building large indices based on
merging indices for progressively larger subcollections.
Starting with a known lightweight algorithm for merging BWTs [Holt and
McMillan, Bionformatics 2014], we show how to modify it in order to merge, or
compute from scratch, also the Longest Common Prefix (LCP) array. We then
expand our technique for merging compressed tries and circular/permuterm
compressed indices, two compressed data structures for which there were
hitherto no known merging algorithms
Comparing Goal-Based and Result-Based Approaches in Modelling Football Outcomes
Two main approaches are considered when building statistical models for football outcomes: (1) the goal-based approach, where the number of goals scored by two competing teams is modelled, and (2) the result-based approach, where a three-category outcome (win–draw–loss) is modelled. The debate about which approach is preferable is still ongoing, although the general agreement is that any direct comparison between the forecasting abilities of the two approaches should be based on the quality of the forecasts. Alternatively, a greater emphasis can be given to diagnostic measures in order to judge the quality of model specifications, as is more customary in statistical modelling. In this paper, we develop a broad comparison of four possible Bayesian models, focusing on model checking and calibration and then using Markov chain Monte Carlo replications to explore the predictive performance over future matches. Although inconclusive, we believe our set of comparison tools may be beneficial for future scholars in differentiating the two approaches
Prediction is not everything, but everything is prediction
Prediction is an unavoidable task for data scientists, and over the last decades statistics and machine learning became the most popular ‘prediction weapons’ in many fields. However, prediction should always be associated with a measure of uncertainty, because from it only we can reconstruct and falsify the model/algorithm decisions. Machine learning methods offer many point-predictions, but they rarely
yield some measure of uncertainty, whereas statistical models usually do a bad job in communicating predictive results. According to the Popper’s falsificationism theory, natural and physical sciences can be falsified on the ground of wrong predictions:
though, for social sciences this is not always true. We move then to a weak instrumentalist philosophy: predictive accuracy is not always constitutive of scientific success, especially in social sciences
Space Efficient Merging of de Bruijn Graphs and Wheeler Graphs
The merging of succinct data structures is a well established technique for the space efficient construction of large succinct indexes. In the first part of the paper we propose a new algorithm for merging succinct representations of de Bruijn graphs. Our algorithm has the same asymptotic cost of the state of the art algorithm for the same problem but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds. In the second part of the paper we consider the more general problem of merging succinct representations of Wheeler graphs, a recently introduced graph family which includes as special cases de Bruijn graphs and many other known succinct indexes based on the BWT or one of its variants. In this paper we provide a space efficient algorithm for Wheeler graph merging; our algorithm works under the assumption that the union of the input Wheeler graphs has an ordering that satisfies the Wheeler conditions and which is compatible with the ordering of the original graphs
Multiparty-session-types Coordination for Core Erlang
In this paper, we present a formalization of multiparty-session-type coordination for a core subset of Erlang and provide a tool for checking the correctness of a system against the specification of its protocol. In Erlang actors are primitive entities, which communicate only through explicit asynchronous message passing. Our tool ensures that if an Erlang system is well typed, then it does not incur in deadlocks or have actors getting stuck waiting for messages that never arrive; moreover any message that is sent will eventually be read. The tool is based on multiparty session types, a formalism introduced to specify the structure of interactions and to ensure safety properties
Avoiding prior–data conflict in regression models via mixture priors
The Bayesian model consists of the prior–likelihood pair. A prior–data conflict arises whenever the prior allocates most of its mass to regions of the parameter space where the likelihood is relatively low. Once a prior–data conflict is diagnosed, what to do next is a hard question to answer. We propose an automatic prior elicitation that involves a two-component mixture of a diffuse and an informative prior distribution that favours the first component if a conflict emerges. Using various examples, we show that these mixture priors can be useful in regression models as a device for regularizing the estimates and retrieving useful inferential conclusions
Il "patto d'emergenza" e la "Repubblica di Tarascona": la Settimana Rossa a Pergola e Sassoferrato
- …
