Search CORE

1,721,116 research outputs found

Simultaneous inference for RNA-Seq data

Author: Risso Davide
Publication venue
Publication date: 17/01/2012
Field of study

In the last few years, RNA-Seq has become a popular choice for high-throughput studies of gene expression, revealing its potential to overcome microarrays and become the new standard for transcriptional profiling. At a gene-level, RNA-Seq yields counts rather than continuous measures of expression, leading to the need for novel methods to deal with count data in high-dimensional problems. In this Thesis, we aim at shedding light on the problems related to the exploration and modeling of RNA-Seq data. In particular, we introduce simple and effective ways to summarize and visualize the data; we define a novel algorithm for the clustering of RNA-Seq data and we implement simple normalization strategies to deal with technology-related biases. Finally, we present a hierarchical Bayesian approach to the modeling of RNA-Seq data. The model accounts for the difference in sequencing depth, as well as for overdispersion, automatically accounting for different types of normalization.Negli ultimi anni il sequenziamento massivo di RNA (RNA-Seq) è diventato una scelta frequente per gli studi di espressione genica. Questa tecnica ha il potenziale di superare i microarray come tecnica standard per lo studio dei profili trascrizionali. A livello genico, i dati di RNA-Seq si presentano sotto forma di conteggi, al contrario dei microarray che stimano l’espressione su una scala continua. Questo porta alla necessità di sviluppare nuovi metodi e modelli per l'analisi di dati di conteggio in problemi con dimensionalità elevata. In questa tesi verranno affrontati alcuni problemi relativi all'esplorazione e alla modellazione dei dati di RNA-Seq. In particolare, verranno introdotti metodi per la visualizzazione e il riassunto numerico dei dati. Inoltre si definirà un nuovo algoritmo per il raggruppamento dei dati e alcune strategie per la normalizzazione, volte a eliminare le distorsioni specifiche di questa tecnologia. Infine, verrà definito un modello gerarchico Bayesiano per modellare l'espressione di dati RNA-Seq e verificarne le eventuali differenze in diverse condizioni sperimentali. Il modello tiene in considerazione la profondità di sequenziamento e la sovra-dispersione e automaticamente sviluppa diversi tipi di normalizzazione

Archivio istituzionale della ricerca - Università di Padova

A novel approach to the clustering of microarray data via nonparametric density estimation

Author: De Bin Riccardo
Risso Davide
Riccardo De Bin
De Bin Riccardo
Davide Risso
Risso Davide
Publication venue
Publication date: 01/01/2011
Field of study

Abstract Background Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations. Results Here, we present a general framework to deal with the clustering of microarray data, based on a three-step procedure: (i) gene filtering; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Via a nonparametric model-based clustering approach we obtain promising results both in simulated and real data. Conclusions The proposed algorithm is a simple and effective tool for the clustering of microarray data, in an unsupervised setting.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

From Data-Driven to Expert-Guided: Combining Unsupervised and Semi-supervised Clustering in Spatial Transcriptomics

Author: Risso Davide
Sottosanti Andrea
Castiglioni Sara A.
Publication venue
Publication date: 01/01/2025
Field of study

One of the challenges in spatial transcriptomic experiments is identifying clusters of genes that exhibit similar expression patterns within specific regions of a tissue sample. The SpaRTaCo model, proposed by A. Sottosanti and D. Risso in 2023, offers a fully data-driven approach for the spatial classification of a tissue based on gene expression levels. Additionally, pathologist annotations of tissue samples are often available, albeit with significant variations between annotations and the data-driven analysis. In this work, we present a pivotal study focusing on a prostate cancer tissue sample. We demonstrate the integration of SpaRTaCo with two semi-supervised variants of the model, which incorporate external biological knowledge. This integration aims to uncover meaningful biological insights and specific gene expression patterns that may not be apparent through solely one of the two approaches

Archivio istituzionale della ricerca - Università di Padova

Designing spatial transcriptomic experiments

Author: Risso Davide
Sottosanti Andrea
Righelli Dario
Publication venue
Publication date: 01/01/2023
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio istituzionale della ricerca - Università di Padova

Co-clustering of Spatially Resolved Transcriptomic Data

Author: Risso Davide
Sottosanti Andrea
Davide Risso
Andrea Sottosanti
Publication venue
Publication date: 14/09/2022
Field of study

Spatial transcriptomics is a modern sequencing technology that allows the measurement of the activity of thousands of genes in a tissue sample and map where the activity is occurring. This technology has enabled the study of the so-called spatially expressed genes, i.e., genes which exhibit spatial variation across the tissue. Comprehending their functions and their interactions in different areas of the tissue is of great scientific interest, as it might lead to a deeper understanding of several key biological mechanisms. However, adequate statistical tools that exploit the newly spatial mapping information to reach more specific conclusions are still lacking. In this work, we introduce SpaRTaCo, a new statistical model that clusters the spatial expression profiles of the genes according to the areas of the tissue. This is accomplished by performing a co-clustering, i.e., inferring the latent block structure of the data and inducing two types of clustering: of the genes, using their expression across the tissue, and of the image areas, using the gene expression in the spots where the RNA is collected. Our proposed methodology is validated with a series of simulation experiments and its usefulness in responding to specific biological questions is illustrated with an application to a human brain tissue sample processed with the 10X-Visium protocol.Comment: Supplementary material attache

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Clustering via nonparametric density estimation: an application to microarray data.

Author: Risso Davide
De Bin Riccardo
DE BIN RICCARDO
Publication venue
Publication date: 01/01/2010
Field of study

Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps: (i) selection of relevant variables; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Applications on simulated and real data show promising results in comparison with those produced by two standard approaches, k-means and Mclust. In the simulation studies, our nonparametric approach shows performances comparable to those of models based on normality assumption, even in Gaussian settings. On the other hand, in two benchmarking real datasets, it outperforms the existing parametric approaches

Archivio istituzionale della ricerca - Università di Padova

Per-sample standardization and asymmetric winsorization lead to accurate clustering of RNA-seq expression profiles

Author: Risso Davide
Pagnotta Stefano Maria
Davide Risso
Stefano Maria Pagnotta
Publication venue
Publication date: 01/01/2021
Field of study

MOTIVATION: Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear.RESULTS: Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AVAILABILITY: The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Crossref

Archivio istituzionale della ricerca - Università di Padova

Spatially Informed Nonnegative Matrix Trifactorization for Coclustering Mass Spectrometry Data

Author: Risso Davide
Sottosanti Andrea
Galimberti Stefania
Capitoli Giulia
Denti Francesco
Publication venue
Publication date: 01/01/2025
Field of study

Mass spectrometry imaging techniques measure molecular abundance in a tissue sample at a cellular resolution, all while preserving the spatial structure of the tissue. This kind of technology offers a detailed understanding of the role of several molecular factors in biological systems. For this reason, the development of fast and efficient computational methods that can extract relevant signals from massive experiments has become necessary. A key goal in mass spectrometry data analysis is the identification of molecules with similar functions in the analyzed biological system. This result can be achieved by studying the spatial distribution of the molecules' abundance patterns. To do so, one can perform coclustering, that is, dividing the molecules into groups according to their expression patterns over the tissue and segmenting the tissue according to the molecules' abundance levels. We present TRIFASE, a semi-nonnegative matrix trifactorization technique that performs coclustering while accounting for the spatial correlation of the data. We propose an estimation algorithm that solves the proposed matrix trifactorization problem. Moreover, to improve scalability, we also propose two heuristic approximations of the most expensive steps, which help the algorithm converge while significantly streamlining the computational cost. We validated our method on a series of simulation experiments, comparing the different estimating strategies discussed in the article. Last, we analyzed a mouse brain tissue sample processed with MALDI-MSI technology, showing how TRIFASE extracts specific expression patterns of molecule abundance in localized tissue areas and discovers blocks of proteins whose activation is directly linked to specific biological mechanisms

Archivio istituzionale della ricerca - Università di Padova

ROC estimation and threshold selection criteria in three-class classification problems for clustered data

Author: ADIMARI GIANFRANCO
CHIOGNA MONICA
TO DUC KHANH
RISSO DAVIDE
Publication venue
Publication date: 01/01/2022
Field of study

Statistical evaluation of diagnostic tests, and, more generally, of biomarkers, is a constantly developing field, in which complexity of the assessment increases with complexity of the design under which data are collected. One particularly prevalent type of data is clustered data, where individual units are naturally nested into clusters. In these cases, bias can arise from omission, in the evaluation process, of cluster-level effects and/or individual covariates. Focussing on the three-class case and for continuous-valued diagnostic tests, we investigate how to exploit the clustered structure of data within a linear-mixed model approach, both when the assumption of normality holds and when it does not. We provide a method for estimation of covariate-specific ROC surfaces and discuss methods for the choice of optimal thresholds, proposing three possible estimators. A proof of consistency and asymptotic normality of the proposed threshold estimators is given. All considered methods are evaluated by extensive simulation experiments. As an application, we study the use of the Lysosomal Associated Membrane Protein Family Member 5 (Lamp5) gene expression as biomarker to distinguish among three types of glutamatergic neurons

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

PsiNorm: a scalable normalization for single-cell RNA-seq data

Author: Risso Davide
Romualdi Chiara
Martello Graziano
Borella Matteo
Publication venue
Publication date: 01/01/2022
Field of study

Motivation: Single-cell RNA sequencing (scRNA-seq) enables transcriptome-wide gene expression measurements at single-cell resolution providing a comprehensive view of the compositions and dynamics of tissue and organism development. The evolution of scRNA-seq protocols has led to a dramatic increase of cells throughput, exacerbating many of the computational and statistical issues that previously arose for bulk sequencing. In particular, with scRNA-seq data all the analyses steps, including normalization, have become computationally intensive, both in terms of memory usage and computational time. In this perspective, new accuratemethods able to scale efficiently are desirable. Results: Here, we propose PsiNorm, a between-sample normalization method based on the power-law Pareto distribution parameter estimate. Here, we show that the Pareto distribution well resembles scRNA-seq data, especially those coming from platforms that use unique molecular identifiers. Motivated by this result, ..

Archivio istituzionale della ricerca - Università di Padova