1,721,046 research outputs found

    Examples of sequence conservation analyses capture a subset of mouse long non-coding RNAs sharing homology with fish conserved genomic elements

    Full text link
    Background: Long non-coding RNAs (lncRNA) are a major class of non-coding RNAs. They are involved in diverse intra-cellular mechanisms like molecular scaffolding, splicing and DNA methylation. Through these mechanisms they are reported to play a role in cellular differentiation and development. They show an enriched expression in the brain where they are implicated in maintaining cellular identity, homeostasis, stress responses and plasticity. Low sequence conservation and lack of functional annotations make it difficult to identify homologs of mammalian lncRNAs in other vertebrates. A computational evaluation of the lncRNAs through systematic conservation analyses of both sequences as well as their genomic architecture is required.Results: Our results show that a subset of mouse candidate lncRNAs could be distinguished from random sequences based on their alignment with zebrafish phastCons elements. Using ROC analyses we were able to define a measure to select significantly conserved lncRNAs. Indeed, starting from ~2,800 mouse lncRNAs we could predict that between 4 and 11% present conserved sequence fragments in fish genomes. Gene ontology (GO) enrichment analyses of protein coding genes, proximal to the region of conservation, in both organisms highlighted similar GO classes like regulation of transcription and central nervous system development. The proximal coding genes in both the species show enrichment of their expression in brain. In summary, we show that interesting genomic regions in zebrafish could be marked based on their sequence homology to a mouse lncRNA, overlap with ESTs and proximity to genes involved in nervous system development.Conclusions: Conservation at the sequence level can identify a subset of putative lncRNA orthologs. The similar protein-coding neighborhood and transcriptional information about the conserved candidates provide support to the hypothesis that they share functional homology. The pipeline herein presented represents a proof of principle showing that a portion between 4 and 11% of lncRNAs retains region of conservation between mammals and fishes. We believe this study will result useful as a reference to analyze the conservation of lncRNAs in newly sequenced genomes and transcriptomes. © 2013 Basu et al.; licensee BioMed Central Ltd

    In silico characterisation of minor wave genes and LINE-1s transcriptional dynamics at murine zygotic genome activation

    Full text link
    Introduction: In mouse, the zygotic genome activation (ZGA) is coordinated by MERVL elements, a class of LTR retrotransposons. In addition to MERVL, another class of retrotransposons, LINE-1 elements, recently came under the spotlight as key regulators of murine ZGA. In particular, LINE-1 transcripts seem to be required to switch-off the transcriptional program started by MERVL sequences, suggesting an antagonistic interplay between LINE-1 and MERVL pathways.Methods: To better investigate the activities of LINE-1 and MERVL elements at ZGA, we integrated publicly available transcriptomics (RNA-seq), chromatin accessibility (ATAC-seq) and Pol-II binding (Stacc-seq) datasets and characterised the transcriptional and epigenetic dynamics of such elements during murine ZGA.Results: We identified two likely distinct transcriptional activities characterising the murine zygotic genome at ZGA onset. On the one hand, our results confirmed that ZGA minor wave genes are preferentially transcribed from MERVL-rich and gene-dense genomic compartments, such as gene clusters. On the other hand, we identified a set of evolutionary young and likely transcriptionally autonomous LINE-1s located in intergenic and gene-poor regions showing, at the same stage, features such as open chromatin and RNA Pol-II binding suggesting them to be, at least, poised for transcription.Discussion: These results suggest that, across evolution, transcription of two different classes of transposable elements, MERVLs and LINE-1s, have likely been confined in genic and intergenic regions respectively in order to maintain and regulate two successive transcriptional programs at ZGA

    oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language

    No full text
    OneChannelGUI is an add-on Bioconductor package providing a new set of functions extending the capability of the affylmGUI package. This library provides a graphical interface (GUI) for Bioconductor libraries to be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays. Affymetrix 3' expression (IVT) arrays as well as the new whole transcript expression arrays, i.e. gene/exon 1.0 ST, are actually implemented. oneChannelGUI is available for most platforms on which R runs, i.e. Windows and Unix-like machines. © The Author 2007. Published by Oxford University Press. All rights reserved

    Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs

    No full text
    The eukaryotic transcriptome is composed of thousands of coding and long non-coding RNAs (lncRNAs). However, we lack a software platform to identify both RNA classes in a given transcriptome. Here we introduce Annocript, a pipeline that combines the annotation of protein coding transcripts with the prediction of putative lncRNAs in whole transcriptomes. It downloads and indexes the needed databases, runs the analysis and produces human readable and standard outputs together with summary statistics of the whole analysis

    The UniTrap resource: Tools for the biologist enabling optimized use of gene trap clones

    Full text link
    We have developed a comprehensive resource devoted to biologists wanting to optimize the use of gene trap clones in their experiments. We have processed 300 602 such clones from both public and private projects to generate 28 199 'UniTraps', i.e. distinct collections of unambiguous insertions at the same subgenic region of annotated genes. The UniTrap resource contains data relative to 9583 trapped genes, which represent 42.3% of the mouse gene content. Among the trapped genes, 7 728 have a counterpart in humans, and 677 are known to be involved in the pathogenesis of human diseases. The aim of this analysis is to provide the wet lab researchers with a comprehensive database and curated tools for (i) identifying and comparing the clones carrying a trap into the genes of interest, (ii) evaluating the severity of the mutation to the protein function in each independent trapping event and (iii) supplying complete information to perform PCR, RT-PCR and restriction experiments to verify the clone and identify the exact point of vector insertion. To share this unique resource with the scientific community, we have designed and implemented a web interface that is freely accessible at http://unitrap.cbm.fvg.it/. © 2007 The Author(s)

    ClusterScan: simple and generalistic identification of genomic clusters

    No full text
    Studies on gene clusters proved to be an excellent source of information to understand genomes evolution and identifying specific metabolic pathways or gene families. Improvements in sequencing methods have resulted in a large increase of sequenced genomes for which cluster annotation could be performed and standardized. Currently available programs are developed to search for specific cluster types and none of them is suitable for a broad range of user-based choices. We have developed ClusterScan which allows identifying clusters of any kind of feature simply based on their genomic coordinates and user-defined categorical annotations

    Diatom flagellar genes and their expression during sexual reproduction in Leptocylindrus danicus

    Full text link
    Background: Flagella have been lost in the vegetative phase of the diatom life cycle, but they are still present in male gametes of centric species, thereby representing a hallmark of sexual reproduction. This process, besides maintaining and creating new genetic diversity, in diatoms is also fundamental to restore the maximum cell size following its reduction during vegetative division. Nevertheless, sexual reproduction has been demonstrated in a limited number of diatom species, while our understanding of its different phases and of their genetic control is scarce. Results: In the transcriptome of Leptocylindrus danicus, a centric diatom widespread in the world's seas, we identified 22 transcripts related to the flagella development and confirmed synchronous overexpression of 6 flagellum-related genes during the male gamete formation process. These transcripts were mostly absent in the closely related species L. aporus, which does not have sexual reproduction. Among the 22 transcripts, L. danicus showed proteins that belong to the Intra Flagellar Transport (IFT) subcomplex B as well as IFT-A proteins, the latter previously thought to be absent in diatoms. The presence of flagellum-related proteins was also traced in the transcriptomes of several other centric species. Finally, phylogenetic reconstruction of the IFT172 and IFT88 proteins showed that their sequences are conserved across protist species and have evolved similarly to other phylogenetic marker genes. Conclusion: Our analysis describes for the first time the diatom flagellar gene set, which appears to be more complete and functional than previously reported based on the genome sequence of the model centric diatom, Thalassiosira pseudonana. This first recognition of the whole set of diatom flagellar genes and of their activation pattern paves the way to a wider recognition of the relevance of sexual reproduction in individual species and in the natural environment

    Characterization of the intronic portion of cadherin superfamily members, common cancer orchestrators

    No full text
    Cadherins are cell-cell adhesion proteins essential for the maintenance of tissue architecture and integrity, and their impairment is often associated with human cancer. Knowledge regarding regulatory mechanisms associated with cadherin misexpression in cancer is scarce. Specific features of the intronic-structure and intronic-based regulatory mechanisms in the cadherin superfamily are unidentified. This study aims at systematically characterizing the intronic portion of cadherin superfamily members and the identification of intronic regions constituting putative targets/triggers of regulation, using a bioinformatic approach and biological data mining. Our study demonstrates that the cadherin superfamily genes harbour specific characteristics in comparison to all non-cadherin genes, both from the genomic and transcriptional standpoints. Cadherin superfamily genes display higher average total intron number and significantly longer introns than other genes and across the entire vertebrate lineage. Moreover, in the human genome, we observed an uncommon high frequency of MIR (mammalian-wide interspersed repeats) and MaLR (mammalian-wide interspersed repeats, a subtype of LTR) regulatory-associated repetitive elements at 5â\u80²-located introns, concomitantly with increased de novo intronic transcription. Using this approach, we identified cadherin intronic-specific sites that may constitute novel targets/triggers of cadherin superfamily expression regulation. These findings pinpoint the need to identify mechanisms affecting particularly MIR and MaLR elements located in introns 2 and 3 of human cadherin genes, possibly important in the expression modulation of this superfamily in homeostasis and cancer. © 2012 Macmillan Publishers Limited All rights reserved

    Insights into the transcriptome of the marine copepod Calanus helgolandicus feeding on the oxylipin-producing diatom Skeletonema marinoi

    No full text
    Diatoms dominate productive regions in the oceans and have traditionally been regarded as sustaining the marine food chain to top consumers and fisheries. However, many of these unicellular algae produce cytotoxic oxylipins that impair reproductive and developmental processes in their main grazers, crustacean copepods. The molecular mode of action of diatoms and diatom oxylipins on copepods is still unclear. In the present study we generated two Expressed Sequence Tags (ESTs) libraries of the copepod Calanus helgolandicus feeding on the oxylipin-producing diatom Skeletonema marinoi and the cryptophyte Rhodomonas baltica as a control, using suppression subtractive hybridization (SSH). Our aim was to investigate differences in the transcriptome between females fed toxic and non-toxic food and identify differentially expressed genes and biological processes targeted by this diatom. We produced 947 high quality ESTs from both libraries, 475 of which were functionally annotated and deposited in GenBank. Clustering and assembling of ESTs resulted in 376 unique transcripts, 200 of which were functionally annotated. Functional enirchment analysis between the two SSH libraries showed that ESTs belonging to biological processes such as response to stimuli, signal transduction, and protein folding were significantly over-expressed in the S. marinoi-fed C. helgolandicus compared to R. baltica-fed C. helgolandicus library. These findings were confirmed by RT-qPCR analysis. In summary, 2 days of feeding on S. marinoi activated a generalized Cellular Stress Response (CSR) in C. helgolandicus, by over-expressing genes of molecular chaperones and signal transduction pathways that protect the copepod from the immediate effects of the diatom diet. Our results provide insights into the response of copepods to a harmful diatom diet at the transcriptome level, supporting the hypothesis that diatom oxylipins elicit a stress response in the receiving organism. They also increase the genomic resources for this copepod species, whose importance could become ever more relevant for pelagic ecosystem functioning in European waters due to global warming. © 2013 Elsevier B.V

    TEspeX: consensus-specific quantification of transposable element expression preventing biases from exonized fragments

    Full text link
    SUMMARY: Transposable elements (TEs) play key roles in crucial biological pathways. Therefore, several tools enabling the quantification of their expression were recently developed. However, many of the existing tools lack the capability to distinguish between the transcription of autonomously expressed TEs and TE fragments embedded in canonical coding/non-coding non-TE transcripts. Consequently, an apparent change in the expression of a given TE may simply reflect the variation in the expression of the transcripts containing TE-derived sequences. To overcome this issue, we have developed TEspeX, a pipeline for the quantification of TE expression at the consensus level. TEspeX uses Illumina RNA-seq short reads to quantify TE expression avoiding counting reads deriving from inactive TE fragments embedded in canonical transcripts. AVAILABILITY AND IMPLEMENTATION: The tool is implemented in python3, distributed under the GNU General Public License (GPL) and available on Github at https://github.com/fansalon/TEspeX (Zenodo URL: https://doi.org/10.5281/zenodo.6800331). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
    corecore