1,721,022 research outputs found

    Advanced Algorithms for Genomic Data Analysis

    Full text link
    The object of this thesis is to develop new algorithmic techniques for the inference of causal relations among the genes of an organism from DNA microarray experiments. Cause-effect relations between genes can be inferred from microarray data (Reverse Engineering) and summarized in a Gene Regulatory Network, a graph in which nodes represent genes and edges represent causal relations among genes: this thesis presents three novel Reverse Engineering algorithms, tailored to tackle differend kinds of DNA microarray experiments and for different levels of detail in the description of the biological systems, and two studies on the difficulty of inferring Gene Regulatory Networks. The first original contribution of the thesis is the application of the Qualitative Reasoning approach to steady state measurements of systematic gene perturbation experiments, i.e. experiments in which the expression of each gene is altered in turn and one sample of the expression is taken each time the system reaches a steady state. The second proposed algorithm, CNET, is based on a heuristic scoring function designed to identify causal relations from time course experiments, i.e. repeated observations of the same biological system at subsequent temporal instants. The algorithm is tailored to recognize causal relations even in the presence of noise and variable regulatory delays. We then present two original in-depth studies, the first on the relations between the performance of two network inference algorithms and the topological and structural properties of oriented Gene Regulatory Networks and the second on the fitness landscape around the optimal parameters configuration, when a class of nonlinear differential equations systems, known as Dynamic Recurrent Neural Networks, are fit to time course data. Both studies provide original and useful knowledge on the difficulty of inferring Gene Regulatory Networks from DNA microarray data. Finally, we present a novel discrete/continuous optimization algorithm for fitting systems of nonlinear differential equations to small scale time course experiments, composed of two interacting modules: an Iterated Local Search procedure to explore the discrete space of network structures and a continuous optimization procedure to identify optimal system parameters. The performance of the three proposed algorithms is assessed both on simulated data and, in some cases, on real DNA microarray data: the methods proved to be competitive with the state of the art of Reverse Engineering algorithms.Obiettivo del presente lavoro di tesi è lo sviluppo di tecniche algoritmiche innovative per l’identificazione di relazioni causali fra i geni di un organismo a partire da esperimenti di DNA microarray. Le relazioni causa-effetto fra i geni possono essere apprese a partire dai dati di microarray (Reverse Engineering) e riassunte in una Rete di Regolazione Genica, un grafo i cui nodi rappresentano i geni e i cui archi rappresentano le relazioni causali fra i geni: questa tesi presenta tre algoritmi innovativi di Reverse Engineering, progettati per elaborare diversi tipi di esperimenti di microarray e con diversi livelli di dettaglio nella descrizione dei sistemi biologici, e due studi sulla difficoltà nell'inferire le Reti di Regolazione Genica. Il primo contributo originale della tesi è l'applicazione del Ragionamento Qualitativo all’elaborazione di misurazioni in stato stazionario di esperimenti di perturbazione sistematica dei geni, vale a dire esperimenti nei quali l’espressione di ogni gene a turno viene alterata e un solo campione dell’espressione genica viene misurato ogni volta che il sistema raggiunge lo stato stazionario. Il secondo algoritmo proposto, CNET, è basato su una funzione euristica progettata per identificare relazioni causali a partire da serie temporali di espressione genica, cioè osservazioni ripetute dello stesso sistema biologico in istanti temporali consecutivi. L'algoritmo è costruito in modo tale da riconoscere le relazioni causali anche in presenza di rumore e di ritardi variabili nella regolazione. Successivamente vengono presentati due studi approfonditi, il primo sulle relazioni fra la performance di due algoritmi di Reverse Engineering e le proprietà strutturali e topologiche della Rete di Regolazione Genica da inferire e il secondo sul panorama di fitness attorno alla configurazione ottima dei parametri di una particolare classe di sistemi dinamici non lineari, le Reti Neurali Dinamiche Ricorsive, che descriva un insieme di serie temporali di espressione genica. Entrambi gli studi hanno consentito di ottenere informazioni utili e originali sulla difficoltà nell'inferire Reti di Regolazione Genica a partire da dati di DNA microarray. Infine, viene presentato un algoritmo innovativo di ottimizzazione mista (continua e discreta) per il fit di sistemi di equazioni differenziali non lineari a esperimenti contenenti serie temporali di espressione genica su piccola scala, composto di due moduli interagenti: una procedura di ricerca locale per esplorare lo spazio discreto delle strutture di rete e una procedura di ottimizzazione continua per l’idenficazione dei parametri ottimi del sistema. La performance dei tre algoritmi proposti viene analizzata sia su dati simulati sia, in certi casi, su dati reali di DNA microarray: i metodi si dimostrano competitivi con lo stato dell’arte degli algoritmi di Reverse Engineering

    Qualitative Reasoning for Biological Network Inference from Systematic Perturbation Experiments.

    No full text
    The systematic perturbation of the components of a biological system has been proven among the most informative experimental setups for the identification of causal relations between the components. In this paper, we present SPQR (Systematic Perturbation - Qualitative Reasoning), a novel Qualitative Reasoning approach to automate the interpretation of the results of systematic perturbation experiments. Our method is based on a qualitative abstraction of the experimental data: for each perturbation experiment, measured values of the observed variables are modelled as lower, equal or higher than the measurements in the wild type condition, when no perturbation is applied. The algorithm exploits a set of IF-THEN rules to infer causal relations between the variables, analyzing the patterns of propagation of the perturbation signals through the biological network, and is specifically designed to minimize the rate of false positives among the inferred relations. Tested on both simulated and real perturbation data, SPQR indeed exhibits a significantly higher precision than the state of the art

    Bayesian Network structure learning: Hybridizing complete search with independence tests

    No full text
    Bayesian Networks (BN) are probabilistic graphical models used to encode in a compact way a joint probability distribution over a set of random variables. The NP-complete problem of finding the most probable BN structure given the observed data has been largely studied in recent years. In the literature, several complete algorithms have been proposed for the problem; in parallel, several tests for statistical independence between the random variables have been proposed, in order to reduce the size of the search space. In this work, we study how to hybridize the algorithm representing the state-of-the-art in complete search with two types of independence tests, and assess the performance of the two hybrid algorithms in terms of both solution quality and computational time. Experimental results show that hybridization with both types of independence test results in a substantial gain in computational time, against a limited loss in solution quality, and allow us to provide some guidelines on the choice of the test type, given the number of nodes in the network and the sample size

    bnstruct: an R package for Bayesian Network structure learning in the presence of missing data

    No full text
    A Bayesian Network is a probabilistic graphical model that encodes probabilistic dependencies between a set of random variables. We introduce bnstruct, an open source R package to (i) learn the structure and the parameters of a Bayesian Network from data in the presence of missing values and (ii) perform reasoning and inference on the learned Bayesian Networks. To the best of our knowledge, there is no other open source software that provides methods for all of these tasks, particularly the manipulation of missing data, which is a common situation in practice

    Qualitative Reasoning on systematic gene perturbation experiments.

    No full text
    Observations of systematic gene perturbation experiments have been proven the most informative for the identification of regulatory relations between genes. For this purpose, we present a novel Qualitative Reasoning approach, based on a qualitative abstraction of DNA-microarray data and on a set of IF-THEN inference rules. Our algorithm exhibits an extremely low rate of false positives, competitive with the state-of-the-art, on both noise-free and noisy simulated data. This, together with the polynomial running time, makes our algorithm an useful tool for systematic gene perturbation experiments, able to identify a subset of the oriented regulatory relations with high reliability and to provide valuable insights on the amount of information conveyed by a set of experiments
    corecore