1,720,991 research outputs found
A bioinformatic and computational approach to regulation of genome function: integrated analysis of genome organization, promoter sequences and gene expression.
Although much is known about gene expression regulation in both Prokaryotes and Eukaryotes, this complex and fascinating mechanism still remains to be fully elucidated. The relatively recent advent of high-throughput techniques for studying transcription has made available an invaluable amount of data that can be used for genome-wide analysis using bioinformatics approaches. These computational methods have now become an integrative part of biological research. The different topics of this thesis are related to the development and application of computational methodologies to better understand the basis of genomic gene expression regulation at different levels. A first level of investigation regarded the relationships among chromosomal structure, expression profile and functional characteristics, focusing on genomic organization and structure. For this task, REEF (REgionally Enriched Features) software has been developed, designed to identify genomic regions enriched in specific features, such as a class or group of genes homogeneous for expression and/or functional characteristics. REEF can be used to detect density variations of specific features along the genome sequence, for example genomic regions with significant enrichment of genes which are co-expressed, differentially expressed, or related to particular molecular functions. Local feature enrichment is calculated using test statistic based on the hypergeometric distribution applied genome-wide by sliding windows and false discovery rate is used for controlling multiplicity. REEF has been applied to the study of genomic distribution of tissue-specific genes and to the analysis of gene differentially expressed when comparing different myeloid cell lines. These analyses identified clusters of tissue-specific genes in the human genome and positional enrichment of hemopoietic functional module-related genes. The second level of investigation regarded gene expression regulation at promoter level. Unknown transcription factor binding sites might be detected by searching for shared sequence elements in upstream regulatory regions of genes with common biological function and/or similar expression profile. In fact, genes with similar expression are frequently co-regulated and genes with related function are often similarly expressed. New methodologies for the identification of regulatory motifs in human promoters were developed and tested. Since a drawback of this approach is the exceedingly high number of results, the use of biological knowledge both before and after application of automated pattern discovery allowed the definition of a “sheltered environment” enhancing the specificity of the computational analysis. COOP (Clustering of Overlapping Patterns) software for the extraction of sequence motifs was developed and used to analyze genomic sequences of 1 Kb upstream of 91 retina specific genes, identifying a set of putative regulative motifs, frequently occurring in retina promoter sequences. Most of them are localized in the proximal portion of promoters and tend to be less variable in central region than in lateral regions and some of them are similar to known regulatory sequences. The performances of COOP were further evaluated by simulation approaches and by applying it to a standard positive control dataset, proposed by Tompa and colleagues for systematic evaluation and comparison of pattern discovery software. A webtool for the prediction of functional elements in promoter sequences, MOST (MOtif Searching web Tool), has been applied to different datasets under various testing conditions in order to study the influence of specific search parameters on results. Two groups of promoter sequences containing known regulatory signals were used as positive control datasets: the public yeast benchmark dataset of Tompa and colleagues and a custom produced dataset of 37 human promoter sequences, subgroups of which contained some instances of one of nine different signals. The testing of performances of the method on different benchmark datasets gave quite positive results.
Taking the concepts behind COOP to a new level, a more rigorous methodology was developed for the identification of surprising and putatively regulatory motifs, by comparing their frequency in promoters sequences of co-expressed genes with that in a background set of sequences, representative of the whole set of human gene promoters. Promoter sequences are divided in overlapping regions, considered independently, for identifying positional bias in the arrangement of transcription factors binding sites along promoters. Due to the genome-wide characteristics of this approach, a new webtool for the automatic identification and retrieval of a high number of promoters in the human genome was also developed. This motif discovery methodology has been adopted to investigate structure of promoters of genes crucial during myeloid differentiation
REEF: searching REgionally Enriched Features in genomes
Abstract Background In Eukaryotic genomes, different features including genes are not uniformly distributed. The integration of annotation information and genomic position of functional DNA elements in the Eukaryotic genomes opened the way to test novel hypotheses of higher order genome organization and regulation of expression. Results REEF is a new tool, aimed at identifying genomic regions enriched in specific features, such as a class or group of genes homogeneous for expression and/or functional characteristics. The method for the calculation of local feature enrichment uses test statistic based on the Hypergeometric Distribution applied genome-wide by using a sliding window approach and adopting the False Discovery Rate for controlling multiplicity. REEF software, source code and documentation are freely available at http://telethon.bio.unipd.it/bioinfo/reef/. Conclusion REEF can aid to shed light on the role of organization of specific genomic regions in the determination of their functional role.</p
MAGIA2: from miRNA and genes expression data integrative analysis to microRNA-transcription factor mixed regulatory circuits (2012 update).
iWhale: a computational pipeline based on Docker and SCons for detection and annotation of somatic variants in cancer WES data
Whole exome sequencing (WES) is a powerful approach for discovering sequence variants in cancer cells but its time effectiveness is limited by the complexity and issues of WES data analysis. Here we present iWhale, a customizable pipeline based on Docker and SCons, reliably detecting somatic variants by three complementary callers (MuTect2, Strelka2 and VarScan2). The results are combined to obtain a single variant call format file for each sample and variants are annotated by integrating a wide range of information extracted from several reference databases, ultimately allowing variant and gene prioritization according to different criteria. iWhale allows users to conduct a complex series of WES analyses with a powerful yet customizable and easy-to-use tool, running on most operating systems (macOs, GNU/Linux and Windows). iWhale code is freely available at https://github.com/alexcoppe/iWhale and the docker image is downloadable from https://hub.docker.com/r/alexcoppe/iwhale
Impact of probe annotation on the integration of miRNA-mRNA expression profiles for miRNA target detection.
MicroRNAs (miRNAs) are small non-coding RNAs that mediate gene expression at the post-transcriptional and translational levels by an imperfect binding to target mRNA 3'UTR regions. While the ab-initio computational prediction of miRNA-mRNA interactions still poses significant challenges, it is possible to overcome some of its limitations by carefully integrating into the analysis the paired expression profiles of miRNAs and mRNAs. In this work, we show how the choice of a proper probe annotation for microarray platforms is an essential requirement to achieve good sensitivity in the identification of miRNA-mRNA interactions. We compare the results obtained from the analysis of the same expression profiles using both gene and transcript based custom CDFs that we have developed for a number of different annotations (ENSEMBL, RefSeq, AceView). In all cases, transcript-based annotations clearly improve the effectiveness of data integration and thus provide a more reliable confirmation of computationally predicted miRNA-mRNA interactions
Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation
Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method for discovering putative regulatory elements, i.e. over-represented motifs, in a selected set of promoters, as compared with a background model. The combination of transcriptional, structural and functional data allowed the identification of sets of promoters pertaining to groups of genes co-expressed and co-localized in regions of the human genome. The application of motif discovery to 26 groups of genes co-expressed in myeloid cells differentiation and co-localized in the genome showed that there are more over-represented motifs in promoters of co-expressed and co-localized genes than in promoters of simply co-expressed genes (CEG). Motifs, which are similar to the binding sequences of known transcription factors, non-uniformly distributed along promoter sequences and/or occurring in highly co-expressed subset of genes were identified. Co-expressed and co-localized gene sets were grouped in two co-expressed genomic meta-regions, putatively representing functional domains of a high-level expression regulation
MAGIA, a web-based tool for MiRNA and Genes Integrated Analysis.
MAGIA (miRNA and genes integrated analysis) is a novel web tool for the integrative analysis of target predictions, miRNA and gene expression data. MAGIA is divided into two parts: the query section allows the user to retrieve and browse updated miRNA target predictions computed with a number of different algorithms (PITA, miRanda and Target Scan) and Boolean combinations thereof. The analysis section comprises a multistep procedure for (i) direct integration through different functional measures (parametric and non-parametric correlation indexes, a variational Bayesian model, mutual information and a meta-analysis approach based on P-value combination) of mRNA and miRNA expression data, (ii) construction of bipartite regulatory network of the best miRNA and mRNA putative interactions and (iii) retrieval of information available in several public databases of genes, miRNAs and diseases and via scientific literature text-mining. MAGIA is freely available for Academic users a
A computational framework for the integrated study of the role of promoters similarity and gene clustering in specific regions of the human genome in establishing co-expression of genes: an application to myeloid cells differentiation.
Genome Evolution in the Cold: Antarctic Icefish Muscle Transcriptome Reveals Selective Duplications Increasing Mitochondrial Function
Antarctic notothenioids radiated over millions of years in subzero waters, evolving
peculiar features, such as antifreeze glycoproteins and absence of heat shock response.
Icefish, family Channichthyidae, also lack oxygen-binding proteins and display extreme
modifications, including high mitochondrial densities in aerobic tissues. A genomic
expansion accompanying the evolution of this fish was reported, but paucity of genomic
information limits the understanding of notothenioid cold adaptation. We reconstructed and
annotated the first skeletal muscle transcriptome of the icefish Chionodraco hamatus
providing a new resource for icefish genomics (http://compgen.bio.unipd.it/chamatusbase/).
We exploited deep sequencing of this energy-dependent tissue to test the hypothesis of
selective duplication of genes involved in mitochondrial function. We developed a
bioinformatic approach to univocally assign C. hamatus transcripts to orthology groups
extracted from phylogenetic trees of five model fish species. C. hamatus duplicates were
recorded for each orthology group allowing the identification of duplicated genes specific
to the icefish lineage. Significantly more duplicates were found in the icefish when
transcriptome data were compared with whole genome data of model fishes species.
Indeed, duplicated genes were significantly enriched in proteins with mitochondrial
localization, involved in mitochondrial function and biogenesis. In cold conditions and
without oxygen-carrying proteins, energy production is challenging. The combination of
high mitochondrial densities and the maintenance of duplicated genes involved in
mitochondrial biogenesis and aerobic respiration might confer a selective advantage by
improving oxygen diffusion and energy supply to aerobic tissues. Our results provide new
insights into the genomic basis of icefish cold adaptation
- …
