1,721,124 research outputs found
A comparison of Gaussian processes and polynomial chaos emulators in the context of haemodynamic pulse-wave propagation modelling
Computational modelling of the cardiovascular system is a promising future direction for patient-specific healthcare. However, the computational cost of these simulators is a bottleneck for their practical use in clinic for real-time digital twins. Emulation can overcome this, yet an extensive investigation into cardiovascular emulators is warranted. In this study, we emulate two one-dimensional haemodynamics models of the pulmonary circulation and compare two common emulation strategies: Gaussian processes (GPs) and polynomial chaos expansions (PCEs). We start by reducing the parameter space of the models through global sensitivity analysis, and then compare both emulation strategies using a multivariate, time-series output quantity of interest and a reduced representation using principal component analysis. We compare the emulators in both forward emulation on test data, as well as in their ability to infer parameters in the inverse problem. Our results indicate that GPs slightly outperform PCEs consistently across every comparison, and that a similar performance is obtained for the emulators of the time-dependent output and reduced output.
This article is part of the theme issue ‘Uncertainty quantification for healthcare and biological systems (Part 1)’
Parameter inference in differential equation models of biopathways using time warped gradient matching
Inference in a partial differential equations model of pulmonary arterial and venous blood circulation using statistical emulation
Physics-informed Gaussian Processes for nonlinear partial differential equations in a fluid-dynamics application
Statistical inference in mechanistic models: time warping for improved gradient matching
Improved Bayesian methods for detecting recombination and rate heterogeneity in DNA sequence alignments
DNA sequence alignments are usually not homogeneous. Mosaic structures may result as a consequence of recombination or rate heterogeneity. Interspecific recombination, in which DNA subsequences are transferred between different (typically viral or bacterial) strains may result in a change of the topology of the underlying phylogenetic tree. Rate heterogeneity corresponds to a change of the nucleotide substitution rate. Various methods for simultaneously detecting recombination and rate heterogeneity in DNA sequence alignments have recently been proposed, based on complex probabilistic models that combine phylogenetic trees with factorial hidden Markov models or multiple changepoint processes. The objective of my thesis is to identify potential shortcomings of these models and explore ways of how to improve them. One shortcoming that I have identified is related to an approximation made in various recently proposed Bayesian models. The Bayesian paradigm requires the solution of an integral over the space of parameters. To render this integration analytically tractable, these models assume that the vectors of branch lengths of the phylogenetic tree are independent among sites. While this approximation reduces the computational complexity considerably, I show that it leads to the systematic prediction of spurious topology changes in the Felsenstein zone, that is, the area in the branch lengths configuration space where maximum parsimony consistently infers the wrong topology due to long-branch attraction. I demonstrate these failures by using two Bayesian hypothesis tests, based on an inter- and an intra-model approach to estimating the marginal likelihood. I then propose a revised model that addresses these shortcomings, and demonstrate its improved performance on a set of synthetic DNA sequence alignments systematically generated around the Felsenstein zone. The core model explored in my thesis is a phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to recombination and rate heterogeneity. The focus of my work is on improving the modelling of the latter aspect. Earlier research efforts by other authors have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. Their work fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. I have improved these earlier phylogenetic FHMMs in two respects. Firstly, by sampling the rate vector from the posterior distribution with RJMCMC I have made the modelling of regional rate heterogeneity more flexible, and I infer the number of different degrees of divergence directly from the DNA sequence alignment, thereby dispensing with the need to arbitrarily select this quantity in advance. Secondly, I explicitly model within-codon rate heterogeneity via a separate rate modification vector. In this way, the within-codon effect of rate heterogeneity is imposed on the model a priori, which facilitates the learning of the biologically more interesting effect of regional rate heterogeneity a posteriori. I have carried out simulations on synthetic DNA sequence alignments, which have borne out my conjecture. The existing model, which does not explicitly include the within-codon rate variation, has to model both effects with the same modelling mechanism. As expected, it was found to fail to disentangle these two effects. On the contrary, I have found that my new model clearly separates within-codon rate variation from regional rate heterogeneity, resulting in more accurate predictions
Machine learning in systems biology at different scales : from molecular biology to ecology
Machine learning has been a source for continuous methodological advances in the field of computational learning from data. Systems biology has profited in various ways
from machine learning techniques but in particular from network inference, i.e. the
learning of interactions given observed quantities of the involved components or data
that stem from interventional experiments. Originally this domain of system biology
was confined to the inference of gene regulation networks but recently expanded to other
levels of organization of biological and ecological systems. Especially the application to
species interaction networks in a varying environment is of mounting importance in
order to improve our understanding of the dynamics of species extinctions, invasions,
and population behaviour in general.
The aim of this thesis is to demonstrate an extensive study of various state-of-art
machine learning techniques applied to a genetic regulation system in plants and to
expand and modify some of these methods to infer species interaction networks in an
ecological setting. The first study attempts to improve the knowledge about circadian
regulation in the plant Arabidopsis thaliana from the view point of machine learning and
gives suggestions on what methods are best suited for inference, how the data should
be processed and modelled mathematically, and what quality of network learning can
be expected by doing so. To achieve this, I generate a rich and realistic synthetic data
set that is used for various studies under consideration of different effects and method
setups. The best method and setup is applied to real transcriptional data, which leads
to a new hypothesis about the circadian clock network structure.
The ecological study is focused on the development of two novel inference methods
that exploit a common principle from transcriptional time-series, which states that expression
profiles over time can be temporally heterogeneous. A corresponding concept
in a spatial domain of 2 dimensions is that species interaction dynamics can be spatially
heterogeneous, i.e. can change in space dependent on the environment and other
factors. I will demonstrate the expansion from the 1-dimensional time domain to the
2-dimensional spatial domain, introduce two distinct space segmentation schemes, and
consider species dispersion effects with spatial autocorrelation. The two novel methods
display a significant improvement in species interaction inference compared to competing
methods and display a high confidence in learning the spatial structure of different
species neighbourhoods or environments
Reconstruction of gene regulatory networks from postgenomic data
Institute for Adaptive and Neural ComputationAn important problem in systems biology is the inference of biochemical pathways
and regulatory networks from postgenomic data. The recent substantial increase
in the availability of such data has stimulated the interest in inferring the networks
and pathways from the data themselves. The main interests of this thesis
are the application, evaluation and the improvement of machine learning methods
applied to the reverse engineering of biochemical pathways and networks. The
thesis starts with the application of an established method to newly available gene
expression data related to the interferon pathway of the human immune system
in order to identify active subpathways under di erent experimental conditions.
The thesis continues with the comparative evaluation of various machine learning
methods (Relevance networks, Graphical Gaussian Models, Bayesian networks)
using observational and interventional data from cytometry experiments as well
as simulated data from a gold-standard network. The thesis also extends and improves
existing methods to include biological prior knowledge under the Bayesian
approach in order to increase the accuracy of the predicted networks and it quanti
es to what extent the reconstruction accuracy can be improved in this way
- …
