1,720,999 research outputs found
Deleterious impact of mutational processes on transcription factor binding sites in human cancer
Somatic mutations occurring in many cancer types are associated with well-understood processes, such as exposure to tobacco smoking or to ultraviolet (UV) light, but also with mutational processes of so far unknown etiology. Mutational processes can be described in terms of so-called mutational signatures, most often represented as vectors of mutation probabilities which indicate what mutation types are preferentially induced by the mutational processes. In this paper we propose a framework to identify which mutational processes are more likely to harm binding sites of a given transcription factor. Our method starts from the binding site motif and assigns to each mutational signature both a hit score, i.e., the likelihood that the mutational process mutates a binding sequence in at least one nucleotide, and a measure of deleteriousness, i.e., the likelihood that a binding site can be disrupted by mutations belonging to the signature. In a final step, the determined scores can be adjusted according to the strengths with which individual mutational signatures have contributed to the observed mutational load of a tumor. We apply the method to CTCF, a transcription factor that is a core architectural protein dictating the dimensional structure of the genome. Our analysis concentrates on melanoma (skin cancer), for which we show that our framework predicts the disruption of CTCF binding sites by specific UV-light associated mutational signatures, confirming our biological expectations
Predicting Drug Synergism by Means of Non-Negative Matrix Tri-Factorization
Traditional drug experiments to find synergistic drug pairs are time-consuming and expensive due to the numerous possible combinations of drugs that have to be examined. Thus, computational methods that can give suggestions for synergistic drug investigations are of great interest. Here, we propose an NMTF-based approach that leverages the integration of different data types for predicting synergistic drug pairs in multiple specific cell lines. Our computational framework relies on a network-based representation of available data about drug synergism, which also allows integrating genomic information about cell lines. We computationally evaluate the performances of our method in finding missing relationships between synergistic drug pairs and cell lines and in computing synergy scores between drug pairs in a specific cell line, as well as we estimate the benefit of adding cell line genomic data to the network. Our approach obtains very good performance (Average Precision Score equal to 0.937, Pearsons correlation coefficient equal to 0.760) when cell line genomic data and rich data about synergistic drugs in a cell line are considered. Finally, we systematically searched our top-scored predictions in the available literature and in the NCI ALMANAC, a well-known database of drug combination experiments, proving the goodness of our findings
PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets
Background: With the growth of available sequenced datasets, analysis of heterogeneous processed data can answer increasingly relevant biological and clinical questions. Scientists are challenged in performing efficient and reproducible data extraction and analysis pipelines over heterogeneously processed datasets. Available software packages are suitable for analyzing experimental files from such datasets one by one, but do not scale to thousands of experiments. Moreover, they lack proper support for metadata manipulation. Results: We present PyGMQL, a novel software for the manipulation of region-based genomic files and their relative metadata, built on top of the GMQL genomic big data management system. PyGMQL provides a set of expressive functions for the manipulation of region data and their metadata that can scale to arbitrary clusters and implicitly apply to thousands of files, producing millions of regions. PyGMQL provides data interoperability, distribution transparency and query outsourcing. The PyGMQL package integrates scalable data extraction over the Apache Spark engine underlying the GMQL implementation with native Python support for interactive data analysis and visualization. It supports data interoperability, solving the impedance mismatch between executing set-oriented queries and programming in Python. PyGMQL provides distribution transparency (the ability to address a remote dataset) and query outsourcing (the ability to assign processing to a remote service) in an orthogonal way. Outsourced processing can address cloud-based installations of the GMQL engine. Conclusions: PyGMQL is an effective and innovative tool for supporting tertiary data extraction and analysis pipelines. We demonstrate the expressiveness and performance of PyGMQL through a sequence of biological data analysis scenarios of increasing complexity, which highlight reproducibility, expressive power and scalability
Analysis and Visualization of Mutation Enrichments for Selected Genomic Regions and Cancer Types
Several studies highlight the relevance of somatic mutations in non-coding regions of the genome which exhibit common interesting behaviors. MutViz is a tool for the identification of mutation enrichments on arbitrary sets of user-defined regions; for a variety of cancer types, it contains preloaded mutations from public datasets, well organized within an effective database organization. MutViz provides a user-friendly interface helping the user in providing sets of regions as input and in obtaining their fast exploration as output, together with simple statistical testing of novel hypotheses
Exploring genomic datasets: From batch to interactive and back
Genomic data management is focused on achieving high performance over big datasets using batch, cloud-based architectures; this enables the execution of massive pipelines, but hampers the capability of exploring the solution space when it is not well-defined, by choosing different experimental samples or query extraction parameters. We present PyGMQL, a Python-based interoperability software layer that enables testing of experimental pipelines; PyGMQL solves the impedance mismatch between a batch execution environment and the agile programming style of Python, and provides transparency of access when exploration requires integrating local and remote resources.Wrapping PyGMQLand Python primitives within Jupyter notebooks guarantees reproducibility of the pipeline when used in different contexts or by different scientists. The software is freely available at https://github.com/DEIB-GECO/PyGMQL
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
A Non-Negative Matrix Tri-Factorization Based Method for Predicting Antitumor Drug Sensitivity
Large annotated cell line collections have been proven to enable the prediction of drug response in the pre-clinical setting. We present an enhancement of Non-Negative Matrix Tri-Factorization method, which allows the integration of different data types for the prediction of missing associations. To test our method we retrieved a dataset from the Cancer Cell Line Encyclopedia (CCLE), containing the connections among cell lines and drugs by means of their IC50 values, and we integrated it by linking cell lines to their respective tissue of origin and genomic profile. We performed two different kind of experiments: a) prediction of missing values in the matrix, b) prediction of the complete drug profile of a new cell line, demonstrating the validity of the method in both scenarios
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
