1,720,958 research outputs found

    Integration of heterogeneous data sources for gene function prediction using Decision Templates and ensembles of learning machines

    No full text
    Several solutions have been proposed to exploit the availability of heterogeneous sources of biomolecular data for gene function prediction, but few attention has been dedicated to the evaluation of the potential improvement in functional classification results that could be achieved through data fusion realized by means of ensemble-based techniques. In this contribution we test the performance of several ensembles of Support Vector Machine (SVM) classifiers, in which each component learner has been trained on different types of bio-molecular data, and then combined to obtain a consensus prediction using different aggregation techniques. Experimental results using data obtained with different high-throughput biotechnologies show that simple ensemble methods outperform both learning machines trained on single homogeneous types of bio-molecular data, and vector space integration methods

    Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction

    Full text link
    Several works showed that biomolecular data integration is a key issue to improve the prediction of gene functions. Quite surprisingly only little attention has been devoted to data integration for gene function prediction through ensemble methods. In this work we show that relatively simple ensemble methods are competitive and in some cases are also able to outperform state-of-the-art data integration techniques for gene function prediction

    Noise tolerance of Multiple Classifier Systems in data integration-based gene function prediction

    No full text
    The availability of various high-throughput experimental and computational methods developed in the last decade allowed molecular biologists to investigate the functions of genes at system level opening unprecedented research opportunities. Despite the automated prediction of genes functions could be included in the most difficult problems in bioinformatics, several recently published works showed that consistent improvements in prediction performances can be obtained by integrating heterogeneous data sources. Nevertheless, very few works have been dedicated to the investigation of the impact of noisy data on the prediction performances achievable by using data integration approaches. In this contribution we investigated the tolerance of multiple classifier systems (MCS) to noisy data in gene function prediction experiments based on data integration methods. The experimental results show that performances of MCS do not undergo a significant decay when noisy data sets are added. In addition, we show that in this task MCS are competitive with kernel fusion, one of the most widely applied technique for data integration in gene function prediction problems

    Towards an integrated pipeline for the in-silico prediction of plant microRNAs and their precursors

    Full text link
    MicroRNAs (miRNAs) are endogenous, small (~ 20 nt), single-stranded, non-coding RNAs that result from the processing of transcribed precursor hairpin structures. They are increasingly recognized as playing crucial roles as post-transcriptional antisense regulators of gene expression through regulation of mRNA stability or translational efficiency. The detection of homologs of known miRNAs through comparative genomic approaches has proved relatively tractable. However, the ab-initio prediction of potentially lineage-specific miRNA precursors through computational methods poses several additional difficulties, not least the fact that not all thermodynamically plausible transcribed hairpins are processed to yield mature miRNAs. We have developed a Support Vector Machine that considers up to 78 features associated with the primary and secondary structures and thermodynamic characteristics of candidate hairpin structures. Our SVM is highly specific in the discrimination of true miRNA precursors from “spurious” hairpins with levels of false positive predictions that are low relative to comparable methods. We also show how our SVM functions as part of an in-silico pipeline for the prediction of novel miRNA precursors in plant genomes

    Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Full text link
    Abstract Background The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.</p

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore