1,721,036 research outputs found
A network approach for low dimensional signatures from high throughput data
One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables—a signature—for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text] –[Formula: see text] ). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models
Multiscale characterization of ageing and cancer progression by a novel network entropy measure
We characterize different cell states, related to cancer and ageing phenotypes, by a measure of entropy of network ensembles, integrating gene expression profiling values and protein interaction network topology. In our case studies, network entropy, that by definition estimates the number of possible network instances satisfying the given constraints, can be interpreted as a measure of the ‘‘parameter space’’ available to the cell. Network entropy was able to characterize specific pathological conditions: normal versus cancer cells, primary tumours that developed metastasis or relapsed, and extreme longevity samples. Moreover, this approach has been applied at different scales, from whole network to specific subnetworks (biological pathways defined on a priori biological knowledge) and single nodes (genes), allowing a deeper understanding of the cell processes involved
Master Equation and Relative Species Abundance Distribution for Lotka-Volterra Models of Interacting Ecological Communities
The understanding of the factors controlling the dynamics of interacting species is a fundamental problem in ecology. The nature of the interactions among different species is usually not completely understood, but it is assumed that the species interaction plays an important role in the ecosystems properties. However recent studies point out as a neutral
hypothesis of non-interacting species with an external source from the surrounding environment allows to explain the relative species abundance (RSA) distribution when the community has reached a stationary situation. In this paper we use a Lotka-Volterra model to derive the (RSA) distribution in the case of different communities which interact each other. We derive a Master equation to study the join RSA distribution of the communities near the stationary state and their correlation. These results suggest a possible explanation of the deviation from the neutral models of empirical RSA distributions
rFBP: Replicated Focusing Belief Propagation algorithm
The rFBP project implements a scikit-learn compatible machine-learning binary classifier leveraging fully connected neural networks with a learning algorithm (Replicated Focusing Belief Propagation, rFBP) that is quickly converging and robust (less prone to brittle overfitting) for ill-posed datasets (very few samples compared to the number of features). The current implementation works only with binary features such as one-hot encoding for categorical data.
This library has already been widely used to successfully predict source attribution starting from GWAS (Genome Wide Association Studies) data. That study was trying to predict the animal origin for an infectious bacterial disease inside the H2020 European project COMPARE (Grant agreement ID: 643476). A full description of the pipeline used in this study is available in the abstract and slides provided into the publications folder of the project.
Algorithm application on real data:
Classification of Genome Wide Association data by Belief Propagation Neural network, CCS Italy 2019, Conference paper
Classification of Genome Wide Association data by Belief Propagation Neural network, CCS Italy 2019, Conference slide
COVID-19 Lung Segmentation
The COVID-19 Lung Segmentation project provides a novel, unsupervised and fully auto-
mated pipeline for the semantic segmentation of ground-glass opacity (GGO) areas in chest
Computer Tomography (CT) scans of patients affected by COVID-19. In the project we
provide a series of scripts and functions for the automated segmentation of lungs 3D areas,
segmentation of GGO areas, and estimation of radiomic features
Impact of blood source and component manufacturing on neurotrophin content and in vitro cell wound healing
Background: We evaluated neurotrophin (NF) levels and their impact on in vitro cell wound healing in eye drops from differently prepared blood sources (cord blood [CB], and peripheral blood [PB]) in the same donor, to avoid intrasubject biological variability. Materials and methods: Twenty healthy adult donor PB samples, and twenty CB samples acquired at the time of delivery were processed to obtain serum (S), platelet-rich plasma (PRP), platelet-poor plasma (PPP), and S retrieved from PRP after activation with Ca-gluconate (PRP-R). The levels of brain-derived neurotrophic factor (BDNF), nerve growth factor (NGF), glial-derived neurotrophic factor (GDNF), fibroblast growth factor (FGF), and epidermal growth factor (EGF) were assessed with a Luminex xMAP (Luminex Corporation), and by using multikine kits from R&D system, and were statistically analysed in the eight different preparations. The impact of S, PRP, PPP, PRP-R from both sources on a cell line responding to NF supplementation (MIO-M1, UCL Institute of Ophthalmology, London, UK) was tested with a scratch wound assay, and analysed by IncuCyte S3 equipment. Results: All the preparations from CB showed higher NF levels, except for BDNF where no difference was found as compared to PB. PRP showed higher NF levels with respect to S, PPP and PRP-R in this decreasing order. Younger donors in PB contributed with higher NF levels. The scratch assay showed different cell migration results, with a complete wound closure only recorded with the supplementation of CB-S, and a progressive reduction by using PRP, PRP-R, and PPP from both sources. Discussion: Protocols of preparation and choice of blood source determine different NF levels in the final products. The therapeutic use of a natural neurotrophin pool from blood sources might have a clinical impact in several different settings. Efforts are needed to standardise the manufacturing and the product content in order to establish and modulate the posology of the final supplementation
Impact of concurrency on the performance of a whole exome sequencing pipeline
Background: Current high-throughput technologies—i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.—generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several
computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples’ pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample’s pipeline.
Results: Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2–2.4 compared to the NPS.
Conclusions: Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools
included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools’ developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters
- …
