1,721,206 research outputs found
Dissecting the transcriptome complexity with bioinformatics tools
Bioinformatics has acquired a lot of importance especially with the advent of genomic approaches. The large amount of data produced by ``omics'' experiments requires appropriate frameworks to handle, store and mine the information and to derive appropriate work hypotheses. Transcriptome is defined as the whole amount of RNA molecules produced by a cell that provides the bridge between the genome and proteins. RNA molecules can be divided in two major classes: protein coding RNAs or messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs). While the first class has been the most studied in the last decades, ncRNAs were recently discovered demonstrating their importance in cell regulatory processes. The most important class of the ncRNAs is composed by the micro RNAs (miRNAs) that have been related to several pathologies, including cancer, because of their ability to regulate oncogenes or oncosuppresors and mRNAs involved in the cell cycle.
Here, I am presenting a work that aims at following and providing the appropriate structure for the interpretation and storage of the transcriptomics data.
In this regard, I devised a tool to integrate expression levels from microarray experiments with gene annotation data like the genome localization and organization in biological pathways. The tool was devised and tuned using two datasets: the first one concerning expression profiles of patients with acute myeloid leukemia (ALL), the second one regarding muscular dystrophies. The application of this new tool to these datasets was very promising, especially regarding meta-analysis studies (muscular dystrophies). For this reason I applied the new tool to analyze public and in-house produced datasets of expression profiles of patients with inflammatory myopathies. This analysis allowed generating the hypothesis of the involvement of JAK-STAT and interferon type I signaling pathways in myopathies. The inferred results were validated using qRT-PCR and the presences of specific proteins produced by validated mRNAs were tested by ELISA and proteomic analysis.
To complete and extend the knowledge of the muscle physiology, I used the pig as a new model organism to develop a framework aiming at the integration of miRNA expression and the regulation of their mRNA-target. It was important to develop the appropriate experimental instruments to perform the expression analyses. I developed two microarray platforms to perform the expression profiles of both miRNA and mRNA purified from the same sample. Then, with the expression data, I computationally analyzed aspects of miRNA biogenesis and performed the data integration leading to the production of regulatory networks specific of the studied tissues, including skeletal-muscle. Our miRNA sequences (mature and hairpin) were crossed with public data from RNA-seq experiments demonstrating that there is an important overlap between our results and the sequences identified by RNA-seq, confirming the goodness of our approachCon l’avvento degli approcci genomici la bioinformatica ha acquisito un importanza sempre maggiore nello studio della biologia. Infatti, gli approcci “omici” permettono di produrre un enorme quantitativo di dati che deve essere archiviato in corrette strutture (database). L’archiviazione del dato comporta la necessità di permettere l’accesso e la manipolazione dello stesso al fine di svolgere gli studi appropriati. Sono quindi richiesti strumenti appropriati che consentano l’ispezione e la manipolazione dei database fine di formulare delle ipotesi coerenti con la problematica biologica che si sta studiando.
Il trascrittoma è definito come l’insieme delle molecole di RNA che sono prodotte da una cellula e rappresentano un passaggio necessario nel processo che dal gene porta alla produzione della proteina. Le molecole di RNA possono essere suddivise in due grandi gruppi: gli RNA codificanti o messaggeri e gli RNA non codificanti. Mentre la prima classe è stata oggetto di ampi studi negli ultimi decenni, gli RNA non codificanti sono stati scoperti solo di recente e associati a funzioni puramente regolative. La classe più importante coinvolta nel processo regolativo degli RNA messaggeri è quella dei micro RNA (miRNA) che sono stati oggetto di un studio intenso che li ha messi in relazione con lo sviluppo di patologie come il cancro in quanto coinvolti nella regolazione fine dell’espressione genica di oncogeni, oncosoppressori o geni del ciclo cellulare.
In questa tesi presento una serie di soluzioni bioinformatiche mirate a fornire le strutture appropriate per condurre gli esperimenti e le analisi dei dati di trascrittomica.
Nel corso del periodo di dottorato, ho sviluppato un metodo che consente l’integrazione dei livelli di espressione genica ottenuti da esperimenti di microarray con informazioni riguardanti la localizzazione degli stessi nei cromosomi o la loro organizzazione in processi biologici. Questo metodo è stato messo a punto e raffinato nel suo funzionamento usando due gruppi di dati disponibili nei database pubblici: il primo riguarda dati di espressione genica ottenuti da esperimenti di microarray su leucemia mieloide acuta; il secondo riguarda l’espressione genica di distrofie muscolari derivanti sempre da dati di microarray. I risultati di questo nuovo metodo si sono dimostrati molto promettenti, in particolare nell’applicazione della meta-analisi che consiste nell’integrare dati provenienti da differenti laboratori.
Forte di questo primo risultato, ho applicato questo metodo di analisi anche all’ispezione dei processi sregolati nelle miopatie infiammatorie affiancando ai dati disponibili prodotti nel laboratorio di Genomica Funzionale diretto dal Prof. G. Lanfranchi quelli depositati nei database pubblici. La meta-analisi da me implementata ha permesso di studiare questa serie di dati sfruttando, per la prima volta, la localizzazione dei geni e raggruppandoli per la funzione permettendo di generare ipotesi sui meccanismi patologici. Grazie a questa tipologia di analisi ho ipotizzato il coinvolgimento nelle miopatie infiammatorie delle vie di segnale che fanno capo a JAK/STAT e agli interferoni. Le ipotesi generate analizzando i dati sono state confermate andando a validare i geni coinvolti nelle vie di segnale appena menzionate usando la qRT-PCR. Inoltre, usando approcci di proteomica, in collaborazione con la Prof. C. Gelfi (Università di Milano) e la tecnica ELISA, è stata anche validata la presenza delle proteine coinvolte in queste vie di segnale nei pazienti affetti da miopatie infiammatorie.
Nella parte conclusiva del mio dottorato, mi sono occupato di completare ed estendere la conoscenza della fisiologia muscolare. Per far questo mi sono spostato sul maiale, un organismo modello molto importante per lo studio di patologie umane e per la produzione di componenti biologiche che possono essere utilizzate per sostituire quelle degradate nell’uomo (valvole aortiche per esempio). Usando il maiale ho sviluppato un sistema per integrare l’espressione dei miRNA e la regolazione che questi esercitano nei messaggeri target. Come prima cosa ho sviluppato le piattaforme di microarray per eseguire l’analisi dell’espressione genica di 14 tessuti di maiale. In particolare ho sviluppato due tipi di piattaforme per eseguire l’analisi dell’espressione dei trascritti e dei miRNA purificati dallo stesso campione. Con questi dati di espressione ho condotto analisi per delucidare alcuni aspetti inerenti la biogenesi dei miRNA. Infine, la completezza dei dati prodotti mi ha permesso di costruire delle reti di regolazione specifiche per ogni tessuto analizzato. Per confermare la validità del nostro approccio ho analizzato il grado di sovrapposizione tra le sequenze derivate dal nostro studio e le sequenze prodotte dai vari esperimenti di RNA-seq. Con questa analisi ho confermato la validità del mio approccio in quanto è stato rivelato una sovrapposizione importante tra le nostre sequenze e quelle derivate da RNA-se
Io voglio vincere! I 10 valori per smettere di partecipare e iniziare a vincere nella gestione dei propri risparmi
Il risparmio è sacrificio per l'ottenimento di importanti obiettivi. I 10 valori: sana ambizione; umiltà; miglioramento costante; metodo e disciplina; determinazione e spirito di sacrificio; autocontrollo; pazienza; onestà e trasparenza; lavoro di squadra; concretezza
A two-dimensional mathematical model for the study of hydrodynamic and sediment transport in the Venice Lagoon
For a long time Venice Lagoon has been investigated but the link between hydrodynamic and sediment transport is still not well understood: in particular, the morphological effects of the major interventions in the past and in the future are not known exactly.
A two-dimensional hydrodynamic model (depth averaged and full non linear), designed to simulate partially dry areas, has been developed merging a 1D network to simulate the minor channels.
The proposed hydrodynamic model is based on an existing framework developed at the IMAGE Department of Padova University in the middle 90’s. The present model follows those contributions but contains a new formulation of convective acceleration and Reynolds stresses, non considered in the existing framework. A sediment transport module, considering both suspended sediments and bed load, has been coupled to the hydrodynamics.
The model has been tested on some geometric configuration where the flow behaviour is known: the problem of a sudden lateral expansion and the formation of free bars on a straight flume. Concerning the Venice Lagoon, discharges and tide level in some sections in a boundary region has been compared with the available measurements.
We studied then the hydraulic behaviour of the three mouths in the actual situation and in the past; in particular, we examined the lagoon in the early 1800’s (from Denaix’s chart) and in the early 1900’s (from chart of Ufficio Idrografico del Magistrato alle Acque), pointing out the differences in the tide propagation and in the sediment dynamic
Measuring the loss of duplicated genes in plant genomes assembled by means of short reads
Introduction
The assembly of a genome is a complex task, whose hardest step is the resolution of repeats. As these regions are usually considered of minor concern for the description of the features of the genome, they are often poorly characterized in a genome analysis. By definition, any region present at least twice in the genome is a repeat, therefore duplicated genes could fall in this category, leading to an underestimation of duplication events in genomes. This effect could be exacerbated in the k-mers based short reads assembly algorithms.
Methods
While working in gap closure experiments for the tomato genome it came out that some of the unplaced contigs were duplicated genes. To our knowledge this loss of duplicated genes has never been measured for plant genomes.
For this reason, the Arabidopsis thaliana genome was used as a reference sequence to generate simulated paired-end Illumina reads, that were assembled with De Bruijn graph based algorithms. Moreover, short reads data of other publicly available Arabidopsis thaliana ecotypes were similarly assembled and compared to the corresponding reference guided assemblies.
Results
The comparison between the already published genome assemblies and the De Bruijn graph based assemblies allowed us to investigate duplicated genes in terms of: 1) how many genes are missing in the genomes; 2) how the k-mers lengths may affect the loss/presence of duplicated genes in the genomes; 3) highlight how the structure of the duplicated genes can be affected by differential degree of nucleotide conservation.
Discussion
All the eukaryotic genome projects are now performed by means of short reads production and assembly. The impact of the sequencing strategy on duplicated gene representativeness should produce new insight to be considered when studying plant genomes and their evolution
Measuring the loss of duplicated genes in plant genomes assembled by means of short reads.
Introduction
The assembly of a genome is a complex task, whose hardest step is the resolution of repeats. As these regions are usually considered of minor concern for the description of the features of the genome, they are often poorly characterized in a genome analysis. By definition, any region present at least twice in the genome is a repeat, therefore duplicated genes could fall in this category, leading to an underestimation of duplication events in genomes. This effect could be exacerbated in the k-mers based short reads assembly algorithms.
Methods
While working in gap closure experiments for the tomato genome it came out that some of the unplaced contigs were duplicated genes. To our knowledge this loss of duplicated genes has never been measured for plant genomes.
For this reason, the Arabidopsis thaliana genome was used as a reference sequence to generate simulated paired-end Illumina reads, that were assembled with De Bruijn graph based algorithms. Moreover, short reads data of other publicly available Arabidopsis thaliana ecotypes were similarly assembled and compared to the corresponding reference guided assemblies.
Results
The comparison between the already published genome assemblies and the De Bruijn graph based assemblies allowed us to investigate duplicated genes in terms of: 1) how many genes are missing in the genomes; 2) how the k-mers lengths may affect the loss/presence of duplicated genes in the genomes; 3) highlight how the structure of the duplicated genes can be affected by differential degree of nucleotide conservation.
Discussion
All the eukaryotic genome projects are now performed by means of short reads production and assembly. The impact of the sequencing strategy on duplicated gene representativeness should produce new insight to be considered when studying plant genomes and their evolution
Allele-specific expression analysis: pipelines, applications, challenges, and unmet needs
In diploid organisms, genes typically exhibit balanced expression of maternal and paternal alleles. However, exceptions exist, such as autosomal genes with allele-specific expression, where genetic and epigenetic variations can lead to the exclusive or preferential expression of a particular allele. In this context, allele-specific expression analysis serves as a powerful tool for understanding gene regulation, with significant functional and clinical implications.
Despite their increasing importance, current analysis pipelines face notable limitations including a lack of end-to-end solutions, restricted options for multi-omics integration, and insufficient support for single-cell sequencing technologies.
This review critically assesses 26 cutting-edge pipelines for allele-specific expression analysis, focusing on their input requirements, capabilities, and applications in the field. Pipelines are categorized based on their ability to handle various data types, support haplotype phasing, employ statistical approaches, and provide graphical outputs. Most pipelines fail to automate preprocessing, integrate multi-omic data, and support high-throughput single-cell sequencing. Future advancements should prioritize the development of automated multi-omic workflows, implementing visualization options, and enhancing compatibility with single-cell technologies. By addressing these gaps, next-generation allele-specific expression pipelines will offer insights into the mechanisms of allele-specific expression regulation, thereby advancing our understanding of its biological and clinical significance
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
