1,720,999 research outputs found

    Deleterious impact of mutational processes on transcription factor binding sites in human cancer

    No full text
    Somatic mutations occurring in many cancer types are associated with well-understood processes, such as exposure to tobacco smoking or to ultraviolet (UV) light, but also with mutational processes of so far unknown etiology. Mutational processes can be described in terms of so-called mutational signatures, most often represented as vectors of mutation probabilities which indicate what mutation types are preferentially induced by the mutational processes. In this paper we propose a framework to identify which mutational processes are more likely to harm binding sites of a given transcription factor. Our method starts from the binding site motif and assigns to each mutational signature both a hit score, i.e., the likelihood that the mutational process mutates a binding sequence in at least one nucleotide, and a measure of deleteriousness, i.e., the likelihood that a binding site can be disrupted by mutations belonging to the signature. In a final step, the determined scores can be adjusted according to the strengths with which individual mutational signatures have contributed to the observed mutational load of a tumor. We apply the method to CTCF, a transcription factor that is a core architectural protein dictating the dimensional structure of the genome. Our analysis concentrates on melanoma (skin cancer), for which we show that our framework predicts the disruption of CTCF binding sites by specific UV-light associated mutational signatures, confirming our biological expectations

    Predicting Drug Synergism by Means of Non-Negative Matrix Tri-Factorization

    No full text
    Traditional drug experiments to find synergistic drug pairs are time-consuming and expensive due to the numerous possible combinations of drugs that have to be examined. Thus, computational methods that can give suggestions for synergistic drug investigations are of great interest. Here, we propose an NMTF-based approach that leverages the integration of different data types for predicting synergistic drug pairs in multiple specific cell lines. Our computational framework relies on a network-based representation of available data about drug synergism, which also allows integrating genomic information about cell lines. We computationally evaluate the performances of our method in finding missing relationships between synergistic drug pairs and cell lines and in computing synergy scores between drug pairs in a specific cell line, as well as we estimate the benefit of adding cell line genomic data to the network. Our approach obtains very good performance (Average Precision Score equal to 0.937, Pearsons correlation coefficient equal to 0.760) when cell line genomic data and rich data about synergistic drugs in a cell line are considered. Finally, we systematically searched our top-scored predictions in the available literature and in the NCI ALMANAC, a well-known database of drug combination experiments, proving the goodness of our findings

    PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

    Full text link
    Background: With the growth of available sequenced datasets, analysis of heterogeneous processed data can answer increasingly relevant biological and clinical questions. Scientists are challenged in performing efficient and reproducible data extraction and analysis pipelines over heterogeneously processed datasets. Available software packages are suitable for analyzing experimental files from such datasets one by one, but do not scale to thousands of experiments. Moreover, they lack proper support for metadata manipulation. Results: We present PyGMQL, a novel software for the manipulation of region-based genomic files and their relative metadata, built on top of the GMQL genomic big data management system. PyGMQL provides a set of expressive functions for the manipulation of region data and their metadata that can scale to arbitrary clusters and implicitly apply to thousands of files, producing millions of regions. PyGMQL provides data interoperability, distribution transparency and query outsourcing. The PyGMQL package integrates scalable data extraction over the Apache Spark engine underlying the GMQL implementation with native Python support for interactive data analysis and visualization. It supports data interoperability, solving the impedance mismatch between executing set-oriented queries and programming in Python. PyGMQL provides distribution transparency (the ability to address a remote dataset) and query outsourcing (the ability to assign processing to a remote service) in an orthogonal way. Outsourced processing can address cloud-based installations of the GMQL engine. Conclusions: PyGMQL is an effective and innovative tool for supporting tertiary data extraction and analysis pipelines. We demonstrate the expressiveness and performance of PyGMQL through a sequence of biological data analysis scenarios of increasing complexity, which highlight reproducibility, expressive power and scalability

    Analysis and Visualization of Mutation Enrichments for Selected Genomic Regions and Cancer Types

    No full text
    Several studies highlight the relevance of somatic mutations in non-coding regions of the genome which exhibit common interesting behaviors. MutViz is a tool for the identification of mutation enrichments on arbitrary sets of user-defined regions; for a variety of cancer types, it contains preloaded mutations from public datasets, well organized within an effective database organization. MutViz provides a user-friendly interface helping the user in providing sets of regions as input and in obtaining their fast exploration as output, together with simple statistical testing of novel hypotheses

    Exploring genomic datasets: From batch to interactive and back

    Full text link
    Genomic data management is focused on achieving high performance over big datasets using batch, cloud-based architectures; this enables the execution of massive pipelines, but hampers the capability of exploring the solution space when it is not well-defined, by choosing different experimental samples or query extraction parameters. We present PyGMQL, a Python-based interoperability software layer that enables testing of experimental pipelines; PyGMQL solves the impedance mismatch between a batch execution environment and the agile programming style of Python, and provides transparency of access when exploration requires integrating local and remote resources.Wrapping PyGMQLand Python primitives within Jupyter notebooks guarantees reproducibility of the pipeline when used in different contexts or by different scientists. The software is freely available at https://github.com/DEIB-GECO/PyGMQL

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    A Non-Negative Matrix Tri-Factorization Based Method for Predicting Antitumor Drug Sensitivity

    Full text link
    Large annotated cell line collections have been proven to enable the prediction of drug response in the pre-clinical setting. We present an enhancement of Non-Negative Matrix Tri-Factorization method, which allows the integration of different data types for the prediction of missing associations. To test our method we retrieved a dataset from the Cancer Cell Line Encyclopedia (CCLE), containing the connections among cell lines and drugs by means of their IC50 values, and we integrated it by linking cell lines to their respective tissue of origin and genomic profile. We performed two different kind of experiments: a) prediction of missing values in the matrix, b) prediction of the complete drug profile of a new cell line, demonstrating the validity of the method in both scenarios

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore