1,721,248 research outputs found

    Entropy of a network ensemble: definitions and applications to genomic data

    No full text
    In this paper we introduce the framework for the application of statistical mechanics to network theory, with a particular emphasis to the concept of entropy of network ensembles. This formalism provides novel observables and insights for the analysis of high-throughput transcriptomics data, integrated with apriori biological knowledge, embedded in-to available public databases of protein-protein interaction and cell signaling

    Network measures for protein folding state discrimination

    Full text link
    Proteins fold using a two-state or multi-state kinetic mechanisms, but up to now there is not a first-principle model to explain this different behavior. We exploit the network properties of protein structures by introducing novel observables to address the problem of classifying the different types of folding kinetics. These observables display a plain physical meaning, in terms of vibrational modes, possible configurations compatible with the native protein structure, and folding cooperativity. The relevance of these observables is supported by a classification performance up to 90%, even with simple classifiers such as discriminant analysis

    Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

    Full text link
    A relation exists between network proximity of molecular entities in interaction networks, functional similarity and association with diseases. The identification of network regions associated with biological functions and pathologies is a major goal in systems biology. We describe a network diffusion-based pipeline for the interpretation of different types of omics in the context of molecular interaction networks. We introduce the network smoothing index, a network-based quantity that allows to jointly quantify the amount of omics information in genes and in their network neighbourhood, using network diffusion to define network proximity. The approach is applicable to both descriptive and inferential statistics calculated on omics data. We also show that network resampling, applied to gene lists ranked by quantities derived from the network smoothing index, indicates the presence of significantly connected genes. As a proof of principle, we identified gene modules enriched in somatic mutations and transcriptional variations observed in samples of prostate adenocarcinoma (PRAD). In line with the local hypothesis, network smoothing index and network resampling underlined the existence of a connected component of genes harbouring molecular alterations in PRAD

    rFBP: Replicated Focusing Belief Propagation algorithm

    Full text link
    The rFBP project implements a scikit-learn compatible machine-learning binary classifier leveraging fully connected neural networks with a learning algorithm (Replicated Focusing Belief Propagation, rFBP) that is quickly converging and robust (less prone to brittle overfitting) for ill-posed datasets (very few samples compared to the number of features). The current implementation works only with binary features such as one-hot encoding for categorical data. This library has already been widely used to successfully predict source attribution starting from GWAS (Genome Wide Association Studies) data. That study was trying to predict the animal origin for an infectious bacterial disease inside the H2020 European project COMPARE (Grant agreement ID: 643476). A full description of the pipeline used in this study is available in the abstract and slides provided into the publications folder of the project. Algorithm application on real data: Classification of Genome Wide Association data by Belief Propagation Neural network, CCS Italy 2019, Conference paper Classification of Genome Wide Association data by Belief Propagation Neural network, CCS Italy 2019, Conference slide

    Dynamics of social media behavior before and after SARS-CoV-2 infection

    Full text link
    IntroductionOnline social media have been both a field of research and a source of data for research since the beginning of the COVID-19 pandemic. In this study, we aimed to determine how and whether the content of tweets by Twitter users reporting SARS-CoV-2 infections changed over time. MethodsWe built a regular expression to detect users reporting being infected, and we applied several Natural Language Processing methods to assess the emotions, topics, and self-reports of symptoms present in the timelines of the users. ResultsTwelve thousand one hundred and twenty-one twitter users matched the regular expression and were considered in the study. We found that the proportions of health-related, symptom-containing, and emotionally non-neutral tweets increased after users had reported their SARS-CoV-2 infection on Twitter. Our results also show that the number of weeks accounting for the increased proportion of symptoms was consistent with the duration of the symptoms in clinically confirmed COVID-19 cases. Furthermore, we observed a high temporal correlation between self-reports of SARS-CoV-2 infection and officially reported cases of the disease in the largest English-speaking countries. DiscussionThis study confirms that automated methods can be used to find digital users publicly sharing information about their health status on social media, and that the associated data analysis may supplement clinical assessments made in the early phases of the spread of emerging diseases. Such automated methods may prove particularly useful for newly emerging health conditions that are not rapidly captured in the traditional health systems, such as the long term sequalae of SARS-CoV-2 infections

    A network approach for low dimensional signatures from high throughput data

    Full text link
    One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables—a signature—for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text] –[Formula: see text] ). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models

    Multiscale characterization of ageing and cancer progression by a novel network entropy measure

    No full text
    We characterize different cell states, related to cancer and ageing phenotypes, by a measure of entropy of network ensembles, integrating gene expression profiling values and protein interaction network topology. In our case studies, network entropy, that by definition estimates the number of possible network instances satisfying the given constraints, can be interpreted as a measure of the ‘‘parameter space’’ available to the cell. Network entropy was able to characterize specific pathological conditions: normal versus cancer cells, primary tumours that developed metastasis or relapsed, and extreme longevity samples. Moreover, this approach has been applied at different scales, from whole network to specific subnetworks (biological pathways defined on a priori biological knowledge) and single nodes (genes), allowing a deeper understanding of the cell processes involved

    Clusters of science and health related Twitter users become more isolated during the COVID-19 pandemic

    Full text link
    COVID-19 represents the most severe global crisis to date whose public conversation can be studied in real time. To do so, we use a data set of over 350 million tweets and retweets posted by over 26 million English speaking Twitter users from January 13 to June 7, 2020. We characterize the retweet network to identify spontaneous clustering of users and the evolution of their interaction over time in relation to the pandemic's emergence. We identify several stable clusters (super-communities), and are able to link them to international groups mainly involved in science and health topics, national elites, and political actors. The science- and health-related super-community received disproportionate attention early on during the pandemic, and was leading the discussion at the time. However, as the pandemic unfolded, the attention shifted towards both national elites and political actors, paralleled by the introduction of country-specific containment measures and the growing politicization of the debate. Scientific super-community remained present in the discussion, but experienced less reach and became more isolated within the network. Overall, the emerging network communities are characterized by an increased self-amplification and polarization. This makes it generally harder for information from international health organizations or scientific authorities to directly reach a broad audience through Twitter for prolonged time. These results may have implications for information dissemination along the unfolding of long-term events like epidemic diseases on a world-wide scale
    corecore