1,721,197 research outputs found

    Network Infusion to Infer Information Sources in Networks

    Full text link
    Several models exist for diffusion of signals across biological, social, or engineered networks. However, the inverse problem of identifying the source of such propagated information appears more difficult even in the presence of multiple network snapshots, and especially for the single-snapshot case, given the many alternative, often similar, progression of diffusion that may lead to the same observed snapshots. Mathematically, this problem can be undertaken using a diffusion kernel that represents diffusion processes in a given network, but computing this kernel is computationally challenging in general. Here, we propose a path-based network diffusion kernel which considers edge-disjoint shortest paths among pairs of nodes in the network and can be computed efficiently for both homogeneous and heterogeneous continuous-time diffusion models. We use this network diffusion kernel to solve the inverse diffusion problem, which we term Network Infusion (NI), using both likelihood maximization and error minimization. The minimum error NI algorithm is based on an asymmetric Hamming premetric function and can balance between false positive and false negative error types. We apply this framework for both single-source and multi-source diffusion, for both single-snapshot and multi-snapshot observations, and using both uninformative and informative prior probabilities for candidate source nodes. We also provide proofs that under a standard susceptible-infected diffusion model, (1) the maximum-likelihood NI is mean-field optimal for tree structures or sufficiently sparse Erdos-Renyi graphs, (2) the minimum-error algorithm is mean-field optimal for regular tree structures, and (3) for sufficiently-distant sources, the multi-source solution is mean-field optimal in the regular tree structure. Moreover, we provide techniques to learn diffusion model parameters such as observation times. We apply NI to several synthetic networks and compare its performance to centrality-based and distance-based methods for Erdos-Renyi graphs, power-law networks, symmetric and asymmetric grids. Moreover, we use NI in two real-world applications. First, we identify the news sources for 3,553 stories in the Digg social news network, and validate our results based on annotated information, that was not provided to our algorithm. Second, we use NI to identify infusion hubs of human diseases, defined as gene candidates that can explain the connectivity pattern of disease-related genes in the human regulatory network. NI identifies infusion hubs of several human diseases including T1D, Parkinson, MS, SLE, Psoriasis and Schizophrenia. We show that, the inferred infusion hubs are biologically relevant and often not identifiable using the raw p-values

    Abstract B20: Discovery of combination therapies in a pan-cancer context through functional complementarity and convergence analysis of oncogenic drivers

    No full text
    Abstract Inherent genetic alterations and tissue-specific variations in cancer present a range of unique vulnerabilities which can be targeted by precision cancer therapies. An understanding of these alterations is a crucial first step in developing novel therapeutic hypotheses in a personalized context. An unbiased method to correlate responses to treatment with small-molecules is cancer cell line (CCL) sensitivity profiling. This allows the understanding of single-agent therapies and associated mechanisms of resistance by employing unbiased combination screening. However, performing such studies in a principled manner to understand multiple potential combinations is limited by the large scale of the required experiments. The area under the concentration-response curve can be used as a measure of sensitivity and the relationship between the sensitivity profiles for a large number of drugs helps build a network based on similarity of activity. Hence, it would be possible to identify ‘modules' or ‘clusters' that correspond to small molecules with highly correlated response across a number of cell lines. However, these clusters are often non-informative in predicting or determining potential combinatorial treatments. By leveraging recent whole-exome and RNA sequencing efforts across a diverse panel of human cancer cell lines coupled with small-molecule sensitivity information, this study aims at applying a pan-cancer exome-wide approach to identify potentially synergistic drug combinations. We define a ‘feature' as an alteration resulting from a single nucleotide variant, genomic amplification or deletion. We first perform feature selection to extract a functionally coupled set of genomic alterations using drug sensitivity as the phenotypic readout. This was achieved through an existing information-theoretic framework which iteratively maximizes the conditional information coefficient of the each potential feature with the target phenotype conditioned on prior selected features. We then integrate gene expression profiles into the model through a regression-based approach. Incorporating sensitivity measurements across a set of 545 small molecules allow us to derive functionally complementary genomic alterations unique to each drug. We find that our model is capable of identifying distinct features even for sets of small molecules that are known to have the same oncogenic target thereby revealing the mechanisitic intricacies that underlie drug activity. This knowledge transforms the drug similarity network. We notice that different small molecules functionally correspond to partially overlapping sets of genomic alterations which belong to the same signalling pathway. Therefore, this enables targeted identification of small molecules or their combinations which are specifically effective against a spectrum of genomic or transcriptomic alterations. We further expand the model to discover features that could confer resistance to therapy. For this case, we systematically identify a number of genomic deletions in tumor suppressors, epigenetic modifiers and genes linked to cell death. Interestingly, these events do not converge onto a single oncogenic pathway, thereby indicating potentially distinct and drug-dependent modes of therapeutic resistance. We believe that the proposed framework presents an unbiased method towards revealing crucial relationships and prospective synergies between different classes of targeted therapeutics. We anticipate that this approach will serve as a template for future efforts focusing on discovery of predictive biomarkers of small molecule sensitivity. Citation Format: Karthik Murugadoss, Manolis Kellis. Discovery of combination therapies in a pan-cancer context through functional complementarity and convergence analysis of oncogenic drivers [abstract]. In: Proceedings of the AACR Precision Medicine Series: Opportunities and Challenges of Exploiting Synthetic Lethality in Cancer; Jan 4-7, 2017; San Diego, CA. Philadelphia (PA): AACR; Mol Cancer Ther 2017;16(10 Suppl):Abstract nr B20.</jats:p

    Spectral Alignment of Networks

    Full text link
    Network alignment refers to the problem of finding a bijective mapping across vertices of two or more graphs to maximize the number of overlapping edges and/or to minimize the number of mismatched interactions across networks. This paper introduces a network alignment algorithm inspired by eigenvector analysis which creates a simple relaxation for the underlying quadratic assignment problem. Our method relaxes binary assignment constraints along the leading eigenvector of an alignment matrix which captures the structure of matched and mismatched interactions across networks. Our proposed algorithm denoted by EigeAlign has two steps. First, it computes the Perron-Frobenius eigenvector of the alignment matrix. Second, it uses this eigenvector in a linear optimization framework of maximum weight bipartite matching to infer bijective mappings across vertices of two graphs. Unlike existing network alignment methods, EigenAlign considers both matched and mismatched interactions in its optimization and therefore, it is effective in aligning networks even with low similarity. We show that, when certain technical conditions hold, the relaxation given by EigenAlign is asymptotically exact over Erdos-Renyi graphs with high probability. Moreover, for modular network structures, we show that EigenAlign can be used to split the large quadratic assignment optimization into small subproblems, enabling the use of computationally expensive, but tight semidefinite relaxations over each subproblem. Through simulations, we show the effectiveness of the EigenAlign algorithm in aligning various network structures including Erdos-Renyi, power law, and stochastic block models, under different noise models. Finally, we apply EigenAlign to compare gene regulatory networks across human, fly and worm species which we infer by integrating genome-wide functional and physical genomics datasets from ENCODE and modENCODE consortia. EigenAlign infers conserved regulatory interactions across these species despite large evolutionary distances spanned. We find strong conservation of centrally-connected genes and some biological pathways, especially for human-fly comparisons

    Loose ends: almost one in five human genes still have unresolved coding status

    No full text
    The authors have accidently omitted one co-author. Part of the work described in this study was performed in the laboratory of Dr Manolis Kellis, Computer Science and Electrical Engineering Department, Massachusetts Institute of Technology, Cambridge, MA, USA and The Broad Institute of MIT and Harvard, Cambridge, MA, USA. Dr Kellis’ name has been added to the authorship and the published article has been updated

    ND code and scripts

    No full text
    <p>Code and scripts for:</p> <p>Network Deconvolution as a General Method to Distinguish Direct Dependencies over Networks<br>By: Soheil Feizi, Daniel Marbach, Muriel Medard and Manolis Kellis<br>Nature Biotechnology</p

    Network Deconvolution Code

    No full text
    <p>Network Deconvolution</p> <p>A General Method to Distinguish Direct Dependencies over Networks<br>By: Soheil Feizi, Daniel Marbach, Muriel Medard and Manolis Kellis<br>Nature Biotechnology</p

    Abstract A15: Deconvolution of diverse cell types in the tumor microenvironment by jointly modeling transcriptomic and epigenomic information

    No full text
    Abstract Deconvolution of individual reference cell type profiles from their mixture in bulk sample can yield biological insights on the cellular heterogeneity of tumors and their microenvironment. Deconvolution can reveal both the composition of individual cell types, and the activity levels for each gene and regulatory region in the constituent cell types, thus reducing the need for single-cell profiling, which can be prohibitively costly, and is still not systematically applicable to epigenomic profiles. Current methods for mixture deconvolution employ regression-based methods to calculate the composition from a predefined set of reference expression signatures, which presents several shortcomings, including: (1) incorporation of only one data type (either epigenomic or transcriptomic, but not both); (2) inability to incorporate prior knowledge regarding the mixture composition; (3) lack of accounting for the variability in the data sets that are used to generate the reference signatures; and (4) variability in prediction due to the choice of genes used to populate the signatures matrix. Here, we present two computational approaches that overcome these limitations, by deconvolving cell type mixture profiles jointly across both transcriptomic and epigenomic datasets. The first method, bDeconvolve, is a hierarchical Bayesian model that jointly models the epigenomics and transcriptomic composition of a cellular mixture. The model also allows for: incorporation of empirical priors regarding the composition of the mixture, incorporation of variability in the data used to generate the signatures, and the ability to infer the signatures of unknown cell types. We foresee this model being used when the signatures matrix is generally well established. The second method, DC-NMF, jointly learns the reference signatures and the composition of the bulk sample. By using sparse non-negative matrix factorization of previous reference datasets, we are able to perform deconvolution on a reduced rank representation of key transcriptomic and epigenomic signatures. We envision that this approach will be used when the signatures matrix is unknown. We apply these methods to both simulated mixtures and complex mixtures from TCGA melanoma datasets. We demonstrate both of our approaches are efficient and accurate in joint transcriptomic and epigenomic deconvolution, and we show that the deconvolved profiles can be used to yield informative clusters and highlight important signatures at the tumor-immune interface. Overall, these methods can play a key role in the development of scalable and personalized approaches to understand tumor immunology in rapid and cost-effective ways. Citation Format: Alvin H. Shi, Yue Li, Karthik Murugadoss, Manolis Kellis. Deconvolution of diverse cell types in the tumor microenvironment by jointly modeling transcriptomic and epigenomic information. [abstract]. In: Proceedings of the AACR Special Conference on Tumor Immunology and Immunotherapy; 2016 Oct 20-23; Boston, MA. Philadelphia (PA): AACR; Cancer Immunol Res 2017;5(3 Suppl):Abstract nr A15.</jats:p

    Network Maximal Correlation

    Full text link
    Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1-ϵ\epsilon approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures

    Abstract A14: Convergence analysis of regulatory mutations into immuno-modulatory pathways across 14 tumor types

    No full text
    Abstract Evasion of immune system surveillance is a hallmark of cancer. Cancer cells may secrete cytokines like TGF-β which hampers cytotoxic T-cells and natural killer cells in addition to recruiting tumor-infiltrating regulatory T-cells endowed with immunosuppressive potential. Little is known about the genomic basis for immune evasion especially in the context of dysregulation and rewiring of the immune-related circuitry of tumor cells. A vast majority of mutations in cancer frequently occur in non-coding regions. The functional impact of these mutations in mediating interactions with the tumor microenvironment have largely been unexplored. Given that recent efforts have implicated non-coding elements in various disease association studies, it can be expected that a significant number of recurrent non-coding mutations in cancer have a regulatory effect. Pan-Cancer analysis has improved the discovery and analysis of these regulatory mutations while avoiding the type I and type II errors made in several tissue-specific cancer projects. By leveraging recent pan-cancer whole-genome sequencing efforts, we have been able to characterize the non-coding mutational profiles of 505 samples, spread across 14 tumor types. These methods amplify the power to detect heterogeneous signals of positive selection thereby enhancing our ability to distinguish ‘driver’ from ‘passenger’ alterations. This study aims at applying a pan-cancer genome-wide approach towards identifying regulatory mutations that potentially impact immune modulation and evasion. We first infer the genome-wide position- and sample-specific probabilities of mutation from somatic mutations calls. A multinomial logistic regression model describes the relationship between the mutation rate and a set of explanatory variables such as the sample ID, replication timing, the genomic context and the local mutation rate. The predicted site-specific probabilities, when overlaid with tissue-specific annotations from the Roadmap Epigenomics consortium and GENCODE, allow us to derive a test statistic for each cis non-coding region that yields information regarding the likelihood that the region of interest is a “driver region”. By utilizing published databases on enhancer-gene links, we extend this framework to comprehensively characterize distal enhancer regions mapped with their target genes. We then systematically identify key immunomodulatory networks enriched for non-coding mutations and evaluate these in a pan-cancer context. A number of immune-related genes are found to harbor an excess of non-coding mutations suggesting higher-order regulatory convergence. We assign a significance score to this convergence by factoring the proximity to the target gene as well as the genomic instability modeled through local copy number changes. Therefore, we integrate individual low-frequency alterations into high-frequency recurrent events across different tumor types. We believe that the proposed model presents an unbiased method towards characterizing the impact of regulatory mutations towards immunomodulation, immune suppression and evasion. We anticipate that this approach will serve as a template for future functional non-coding mutational dissections in tumor-related studies. Citation Format: Karthik Murugadoss, Malene Rasmussen, Alvin Shi, Manolis Kellis. Convergence analysis of regulatory mutations into immuno-modulatory pathways across 14 tumor types. [abstract]. In: Proceedings of the AACR Special Conference on Tumor Immunology and Immunotherapy; 2016 Oct 20-23; Boston, MA. Philadelphia (PA): AACR; Cancer Immunol Res 2017;5(3 Suppl):Abstract nr A14.</jats:p
    corecore