2,303 research outputs found

    Multimodal regularized linear models with flux balance analysis for mechanistic integration of omics data

    Full text link
    Motivation: High-throughput biological data, thanks to technological advances, have become cheaper to collect, leading to the availability of vast amounts of omic data of different types. In parallel, the in silico reconstruction and modelling of metabolic systems is now acknowledged as a key tool to complement experimental data on a large scale. The integration of these model- and data-driven information is therefore emerging as a new challenge in systems biology, with no clear guidance on how to better take advantage of the inherent multi-source and multi-omic nature of these data types while preserving mechanistic interpretation.Results: Here we investigate different regularisation techniques for high-dimensional data derived from the integration of gene expression profiles with metabolic flux data, extracted from strain-specific metabolic models, to improve cellular growth rate predictions. To this end, we propose ad-hoc extensions of previous regularisation frameworks including group, view-specific and principal component regularisation, and experimentally compare them using data from 1,143 Saccharomyces cerevisiae strains. We observe a divergence between methods in terms of regression accuracy and integration effectiveness based on the type of regularisation employed. In multi-omic regression tasks, when learning from experimental and model-generated omic data, our results demonstrate the competitiveness and ease of interpretation of multimodal regularised linear models compared to data-hungry methods based on neural networks.Availability: All data, models, and code produced in this work are available on GitHubat https://github.com/Angione-Lab/HybridGroupIPFLasso_pc2Lasso

    Multi-dimensional experimental and computational exploration of metabolism pinpoints complex probiotic interactions

    Full text link
    Multi-strain probiotics are widely regarded as effective products for improving gut microbiota stability and host health, providing advantages over single-strain probiotics. However, in general, it is unclear to what extent different strains would cooperate or compete for resources, and how the establishment of a common biofilm microenvironment could influence their interactions. In this work, we develop an integrative experimental and computational approach to comprehensively assess the metabolic functionality and interactions of probiotics across growth conditions. Our approach combines co-culture assays with genome-scale modelling of metabolism and multivariate data analysis, thus exploiting complementary data- and knowledge-driven systems biology techniques. To show the advantages of the proposed approach, we apply it to the study of the interactions between two widely used probiotic strains of Lactobacillus reuteri and Saccharomyces boulardii, characterising their production potential for compounds that can be beneficial to human health. Our results show that these strains can establish a mixed cooperative-antagonistic interaction best explained by competition for shared resources, with an increased individual exchange but an often decreased net production of amino acids and short-chain fatty acids. Overall, our work provides a strategy that can be used to explore microbial metabolic fingerprints of biotechnological interest, capable of capturing multifaceted equilibria even in simple microbial consortia

    Clinical stratification improves the diagnostic accuracy of small omics datasets within machine learning and genome-scale metabolic modelling methods

    Full text link
    Background: Recently, multi-omic machine learning architectures have been proposed for the early detection of cancer. However, for rare cancers and their associated small datasets, it is still unclear how to use the available multi-omics data to achieve a mechanistic prediction of cancer onset and progression, due to the limited data available. Hepatoblastoma is the most frequent liver cancer in infancy and childhood, and whose incidence has been lately increasing in several developed countries. Even though some studies have been conducted to understand the causes of its onset and discover potential biomarkers, the role of metabolic rewiring has not been investigated in depth so far. Methods: Here, we propose and implement an interpretable multi-omics pipeline that combines mechanistic knowledge from genome-scale metabolic models with machine learning algorithms, and we use it to characterise the underlying mechanisms controlling hepatoblastoma. Results and Conclusions: While the obtained machine learning models generally present a high diagnostic classification accuracy, our results show that the type of omics combinations used as input to the machine learning models strongly affects the detection of important genes, reactions and metabolic pathways linked to hepatoblastoma. Our method also suggests that, in the context of computer-aided diagnosis of cancer, optimal diagnostic accuracy can be achieved by adopting a combination of omics that depends on the patient’s clinical characteristics

    Machine and deep learning meet genome-scale metabolic modeling

    Full text link
    Omic data analysis is steadily growing as a driver of basic and applied molecular biology research. Core to the interpretation of complex and heterogeneous biological phenotypes are computational approaches in the fields of statistics and machine learning. In parallel, constraint-based metabolic modeling has established itself as the main tool to investigate large-scale relationships between genotype, phenotype, and environment. The development and application of these methodological frameworks have occurred independently for the most part, whereas the potential of their integration for biological, biomedical, and biotechnological research is less known. Here, we describe how machine learning and constraint-based modeling can be combined, reviewing recent works at the intersection of both domains and discussing the mathematical and practical aspects involved. We overlap systematic classifications from both frameworks, making them accessible to nonexperts. Finally, we delineate potential future scenarios, propose new joint theoretical frameworks, and suggest concrete points of investigation for this joint subfield. A multiview approach merging experimental and knowledge-driven omic data through machine learning methods can incorporate key mechanistic information in an otherwise biologically-agnostic learning process.</p

    A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth

    Full text link
    Metabolic modeling and machine learning are key components in the emerging next generation of systems and synthetic biology tools, targeting the genotype–phenotype–environment relationship. Rather than being used in isolation, it is becoming clear that their value is maximized when they are combined. However, the potential of integrating these two frameworks for omic data augmentation and integration is largely unexplored. We propose, rigorously assess, and compare machine-learning–based data integration techniques, combining gene expression profiles with computationally generated metabolic flux data to predict yeast cell growth. To this end, we create strain-specific metabolic models for 1,143 Saccharomyces cerevisiae mutants and we test 27 machine-learning methods, incorporating state-of-the-art feature selection and multiview learning approaches. We propose a multiview neural network using fluxomic and transcriptomic data, showing that the former increases the predictive accuracy of the latter and reveals functional patterns that are not directly deducible from gene expression alone. We test the proposed neural network on a further 86 strains generated in a different experiment, therefore verifying its robustness to an additional independent dataset. Finally, we show that introducing mechanistic flux features improves the predictions also for knockout strains whose genes were not modeled in the metabolic reconstruction. Our results thus demonstrate that fusing experimental cues with in silico models, based on known biochemistry, can contribute with disjoint information toward biologically informed and interpretable machine learning. Overall, this study provides tools for understanding and manipulating complex phenotypes, increasing both the prediction accuracy and the extent of discernible mechanistic biological insights

    Metatranscriptomics-guided genome-scale metabolic modeling of microbial communities

    Full text link
    Multi-omics data integration via mechanistic models of metabolism is a scalable and flexible framework for exploring biological hypotheses in microbial systems. However, although most microorganisms are unculturable, such multi-omics modeling is limited to isolate microbes or simple synthetic communities. Here, we developed an approach for modeling microbial activity and interactions that leverages the reconstruction of metagenome-assembled genomes and associated genome-centric metatranscriptomes. At its core, we designed a method for condition-specific metabolic modeling of microbial communities through the integration of metatranscriptomic data. Using this approach, we explored the behavior of anaerobic digestion consortia driven by hydrogen availability and human gut microbiota dysbiosis associated with Crohn’s disease, identifying condition-dependent amino acid requirements in archaeal species and a reduced short-chain fatty acid exchange network associated with disease, respectively. Our approach can be applied to complex microbial communities, allowing a mechanistic contextualization of multi-omics data on a metagenome scale.<br/

    Modeling Customer Experience in a Contact Center through Process Log Mining

    Full text link
    The use of data mining and modeling methods in service industry is a promising avenue for optimizing current processes in a targeted manner, ultimately reducing costs and improving customer experience. However, the introduction of such tools in already established pipelines often must adapt to the way data is sampled and to its content. In this study, we tackle the challenge of characterizing and predicting customer experience having available only process log data with time-stamp information, without any ground truth feedback from the customers. As a case study, we consider the context of a contact center managed by TeleWare and analyze phone call logs relative to a two months span. We develop an approach to interpret the phone call process events registered in the logs and infer concrete points of improvement in the service management. Our approach is based on latent tree modeling and multi-class Naïve Bayes classification, which jointly allow us to infer a spectrum of customer experiences and test their predictability based on the current data sampling strategy. Moreover, such approach can overcome limitations in customer feedback collection and sharing across organizations, thus having wide applicability and being complementary to tools relying on more heavily constrained data

    Genome-scale metabolic modelling of SARS-CoV-2 in cancer cells reveals an increased shift to glycolytic energy production

    Full text link
    Cancer is considered a high-risk condition for severe illness resulting from Covid-19. The interaction between severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and human metabolism is key to elucidating the risk posed by Covid-19 for cancer patients and identifying effective treatments, yet it is largely uncharacterised on a mechanistic level. We present a genome-scale map of short-term metabolic alterations triggered by SARS-CoV-2 infection of cancer cells. Through transcriptomics- and proteomics-informed genomescale metabolic modelling, we characterise the role of RNA and fatty acid biosynthesis in conjunction with a rewiring in energy production pathways and enhanced cytokine secretion. These findings link together complementary aspects of viral invasion of cancer cells, while providing mechanistic insights that can inform the development of treatment strategies.</p

    Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction

    Full text link
    Abstract Motivation Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organization across cell types, as well as to elucidating pathogenic processes and identifying molecular drug targets. Although significant effort has been devoted towards this direction, existing computational methods mainly rely on gene expression levels, possibly ignoring the information conveyed by mechanistic biochemical knowledge. Moreover, except for a few recent attempts, most of the existing approaches only consider the information of the organism under analysis, without exploiting the information of related model organisms. Results We propose a novel method for the reconstruction of the human gene regulatory network, based on a transfer learning strategy that synergically exploits information from human and mouse, conveyed by gene-related metabolic features generated in silico from gene expression data. Specifically, we learn a predictive model from metabolic activity inferred via tissue-specific metabolic modelling of artificial gene knockouts. Our experiments show that the combination of our transfer learning approach with the constructed metabolic features provides a significant advantage in terms of reconstruction accuracy, as well as additional clues on the contribution of each constructed metabolic feature. Availability and implementation The method, the datasets and all the results obtained in this study are available at: https://doi.org/10.6084/m9.figshare.c.5237687. Supplementary information Supplementary data are available at Bioinformatics online
    corecore