1,720,971 research outputs found

    Gene prediction and functional annotation in the Vitis vinifera genome

    Full text link
    In the last years the increasing number of sequencing projects and the availability of completely sequenced genomes pose the problem of searching for gene sequences in a rapid and reliable way. Bioinformatics is playing a fundamental role in this research field. In fact, many bioinformatic tools and software that consider multiple and heterogeneous evidence sources have been developed in order to improve the genome annotation. Genome annotation can be divided in two distinct phases: gene prediction and functional annotation. The prediction phase is the process to identify the exact gene structure, delimiting the exon-intron boundaries and the localization of genes on the genome. Otherwise, the functional annotation is the action of characterizing predicted genes, assigning them a biological function, a metabolic role or describing structural features. This PhD project focuses on the development of computational methods for the management of data coming from a genome sequencing project. The work consists on the implementation of a bioinformatic platform for gene prediction and functional annotation of the Vitis vinifera genome. This work has been carried out in collaboration with CRIBI bioinformatic group, that is member of the Grape sequencing project. The annotation platform consists of two distinct modules. The first module regards gene prediction. Different computational methods showed a great reliability to discover molecular signals and to reconstruct gene boundaries, becoming fundamental in the annotation at genome-level. These methods are represented by ab-initio predictors, genome alignments of ESTs or proteins or comparative genomics. Otherwise, in the second module of annotation platform, the predicted genes are functionally characterized, adopting mainly a similarity approach. This approach bases on the assumption that regions highly conserved maintain the original functions or roles also in different species. This project includes also the development of databases and tools to store and retrieve genome data. In particular, the PhD work focused on the implementation of a XML-based query system that permits the information retrieval through web page access and, in the next future, also through web-services workflows.Negli ultimi anni il crescente numero di progetti di sequenziamento e la disponibilità di genomi completamente sequenziati hanno posto il problema della ricerca di sequenze geniche in modo rapido e affidabile. La Bioinformatica sta giocando un ruolo fondamentale in questo campo di ricerca. Infatti, sono stati sviluppati molti strumenti informatici che utilizzano dati molteplici ed eterogenei al fine di migliorare l’annotazione genomica. L’annotazione genomica può essere suddivisa in due fasi distinte: la predizione genica e l’annotazione funzionale. La predizione genica consiste nell’individuazione dell’esatta struttura del gene, determinando il confine esone-introne e la localizzazione dei geni sul genoma. Invece, l’annotazione funzionale è il processo di caratterizzazione dei geni, che assegna loro una funzione biologica, un ruolo metabolico o che descrive le loro caratteristiche strutturali. Questo progetto di dottorato prevede lo sviluppo di metodi computazionali per la gestione dei dati provenienti da progetti di sequenziamento genomico. Il lavoro consiste nella realizzazione di una piattaforma bioinformatica per la predizione genica e l’annotazione funzionale del genoma di Vitis vinifera. Questo lavoro è stato svolto in collaborazione con il gruppo di bioinformatica del CRIBI, membro del progetto internazionale di sequenziamento del genoma di vite. La piattaforma di annotazione è suddivisa in due moduli. Il primo modulo riguarda la predizione genica. Diverse metodiche computazionali hanno mostrato una grande affidabilità nella ricerca di segnali molecolari e nella ricostruzione della struttura genica, diventando strumenti fondamentali per l’annotazione genomica. Questi metodi sono rappresentati da predittori ab-initio, da allineamenti di EST o proteine sul genoma o dalla genomica comparata. Invece, nel secondo modulo della piattaforma di annotazione, i geni predetti sono caratterizzati funzionalmente attraverso l’utilizzo di un approccio di similarità. Questo approccio si basa sul presupposto che le regioni altamente conservate mantengono le funzioni e i ruoli originali anche in specie diverse. Questo progetto prevede anche lo sviluppo di banche dati e strumenti per immagazzinare e recuperare i dati di annotazione. In particolare, il lavoro di dottorato si è concentrato sulla realizzazione di un sistema di query basato su XML che permette il recupero delle informazioni attraverso pagine web e, nel prossimo futuro, anche attraverso l’utilizzo di workflow basati sui web services

    A web-based platform to retrieve user-ranked data from human exome/genome sequencing projects.

    No full text
    Genome and exome sequencing projects produce huge amount of data, which in turns can yield extensive catalogues of human genetic variations. However, how to identify which genetic variations are implicated in the onset and progression of human diseases remains still a difficult task. New bioinformatic tools are required to efficiently spill out a small number of candidate variants from the large amounts of DNA sequencing data produced. Here we present the development of a platform designed to manage and retrieve data from human exome/genome sequencing projects. The platform integrates heterogeneous information to help the association of variations to the pathology/phenotype under study. The information can be related to gene features (Gene Ontology, Disease Ontology, OMIM, InterPro annotations), to genomic context, or it can describe the CDS-effects of variants (dbSNP, degree of deleteriousness) and their confidence in terms of depth of sequence coverage and calling score. The platform is accessible through a web interface where the user can upload one or more files containing the variants in VCF format. SNPs and microindels are automatically mapped on the genome and stored in a relational database together with their possible effects on the corresponding transcripts and proteins. A powerful and flexible query system allows then to explore the data applying different criteria which are related to the heterogeneous information stored in the database. The results of the processed query are displayed on a ranked list ordered according to how many of the imposed criteria are satisfied. Therefore the query and the ranking systems allow the user to filter the information at different levels and to directly assess the significance of the results. The web platform and the query system are based on a scalable and easily configurable XML-based language. This allows to easily face the continuous increase of data volume and heterogeneity and the subsequent database structure updates, without any modification of software code

    PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads

    No full text
    The sequencing of bisulfite-treated DNA (Bi-Seq) is becoming a gold standard for methylation studies. The mapping of Bi-Seq reads is complex and requires special alignment algorithms. This problem is particularly relevant for SOLiD color space, where the bisulfite conversion C/T changes two adjacent colors into 16 possible combinations. Here, we present an algorithm that efficiently aligns Bi-Seq reads obtained either from SOLiD or Illumina. An accompanying methylation-caller program creates a genomic view of methylated and unmethylated Cs on both DNA strands. Availability and implementation: The algorithm has been implemented as an option of the program PASS, freely available at http://pass.cribi.unipd.it

    ScaMPI: a program for genome Scaffolding using Mate Paired Information

    No full text
    Motivation: the revolution in sequencing technologies referred to as "Next Generation Sequencing" has enabled rapid genome sequencing at reduced costs. While it has become easier to obtain a “draft” of a genome (that is usually highly fragmented into small contigs), producing a high quality genome assembly, with scaffolds spanning entire chromosomes, still presents hurdles and a lack of dedicated tools. Methods: ScaMPI is a comprehensive suite of programs to perform genome scaffolding using Mate Paired reads (in particular SOLiD color-space encoded reads). ScaMPI provides a greedyalgorithm for scaffolding with mate paired reads, a web - based interface to assist manual scaffolding and refinements of the assembly, and a set of tools for complementary tasks like contig consistency validation via physical coverage check, primer design, gap - closure, BAC - ends validation of the assembly and de novo telomere identification (TRAP, Telomeric Repeat Analysis Program). Results: the ScaMPI suite has been used to scaffold the contigs of a genome project of an oil - producing microalga (N. gaditana) sequenced with the 454 (N50: 40 kbp). ScaMPI automatically produced a set of scaffold (N50: 600 kbp) using two libraries of SOLiD mate pairs. The web interface has been used for manual refinements to produce a set of 58 scaffolds (N50: 1 Mbp). The telomere - identification module has been used to find telomere, thus discovering that 21 scaffolds were complete chromosomes (out of 30 estimated). Sequencing a set of 528 BAC - ends we found that 97% of them confirmed the assembly of 32 large scaffolds (accounting for 20 Mbp), while the remainder 3% did not disprove it

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore