1,721,677 research outputs found
Increased expression of CD40 ligand in activated CD4+ T lymphocytes of systemic sclerosis patients.
*G. Valentini ed M.F. Romano hanno egualmente contribuito al lavor
Increased expression of CD40 ligand in activated CD4+ T lymphocytes of systemic sclerosis patients.
*G. Valentini ed M.F. Romano hanno egualmente contribuito al lavor
A neural model for the prediction of pathogenic genomic variants in Mendelian diseases
The detection of pathogenic genomic variants associated with genetic or cancer diseases represents an open problem in the context of the Genomic Medicine. In particular the detection of mutations in the non-coding regions of human genome represents a particularly challenging machine learning problem, since the number of neutral variants largely outnumber the pathogenic ones, thus resulting in highly imbalanced classification problems. We applied neural networks to the detection of pathogenic regulatory genomic variants in Mendelian diseases and we showed that leveraging imbalance-aware techniques and deep learning algorithms, we can obtain state-of-the-art results, using a less complex model than those proposed in literature for this challenging prediction task
Prediction of gene function using ensembles of SVMs and heterogeneous data sources
The ever increasing amount of biomolecular data available in public domain databases for a broad range of organisms coupled with recent advances in machine learning research has stimulated interest in computational approaches on gene function prediction. In this context data integration from heterogeneous biomolecular data sources plays a key role. In this contribution we test the performance of several ensembles of SVM classifiers, in which each component learner has been trained on different types of data, and then combined using different aggregation techniques. The compared combination methods are the widely adopted
linear weighted combination, the logarithmic weighted combination and the similarity based decision templates approach. The results show that heterogeneous data
integration through ensemble methods represents a valuable research line in gene
function prediction
DDAG K-TIPCAC : an ensemble method for protein subcellular localization
Protein subcellular location prediction is one of the most difficult multiclass prediction problems in modern computational biology.
Many methods have been proposed in the literature to solve this problem, but all the existing approaches are affected by some limitations. In this contribution we propose a novel method for protein subcellular location prediction that performs multiclass classification by combining kernel
classifiers through DDAG. Each base classifier, called K-TIPCAC, projects
the points on a Fisher subspace estimated on the training data by means of a novel technique. Experimental results clearly indicated that DDAG K-TIPCAC performs equally, if not better, than state-of-the-art ensemble methods for protein subcellular location
Clusterv : a tool for assessing the reliability of clusters discovered in DNA microarray data
We present a new R package for the assessment of the reliability of clusters discovered in high dimensional DNA microarray data.
The package implements methods based on random projections that approximately preserve distances between examples
in the projected subspaces
Hierarchical ensemble methods for protein function prediction
Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research
An experimental bial-variance analysis of SVM ensembles based on resampling techniques
Recently, bias-variance decomposition of error has been used as a tool to study the behavior of learning algorithms and to develop new ensemble methods well suited to the bias-variance characteristics of base learners. We propose methods and procedures, based on Domingo's unified bias-variance theory, to evaluate and quantitatively measure the bias-variance decomposition of error in ensembles of learning machines. We apply these methods to study and compare the bias-variance characteristics of single support vector machines (SVMs) and ensembles of SVMs based on resampling techniques, and their relationships with the cardinality of the training samples. In particular, we present an experimental bias-variance analysis of bagged and random aggregated ensembles of SVMs in order to verify their theoretical variance reduction properties. The experimental bias-variance analysis quantitatively characterizes the relationships between bagging and random aggregating, and explains the reasons why ensembles built on small subsamples of the data work with large databases. Our analysis also suggests new directions for research to improve on classical bagging
True path rule hierarchical ensembles for genome-wide gene function prediction
Gene function prediction is a complex computational problem, characterized by several items: the number of functional classes is large, and a gene may belong to multiple classes; functional classes are structured according to a hierarchy; classes are usually unbalanced, with more negative than positive examples; class labels can be uncertain and the annotations largely incomplete; to improve the predictions, multiple sources of data need to be properly integrated. In this contribution we focus on the first three items, and in particular on the development of a new method for the hierarchical genome-wide and ontology-wide gene function prediction.
The proposed algorithm is inspired by the “true path rule” that governs both the Gene Ontology and FunCat taxonomies. According to this rule, the proposed True Path Rule (TPR) ensemble method is characterized by a two-way asymmetric flow of information that traverses the graph-structured ensemble: positive predictions for a node influence in a recursive way its ancestors, while negative predictions influence its offsprings. Cross-validated results with the model organism S. cerevisiae, using 7 different sources of biomolecular data, and
a theoretical analysis of the the TPR algorithm show the effectiveness and the drawbacks of the proposed approach
Gene expression-based prediction of malignancies
Molecular classification of malignancies can potentially stratify patients into distinct subclasses not detectable using traditional classification of tumors, opening new perspectives on the diagnosis and personalized therapy of polygenic diseases. In this paper we present a brief overview of our work on gene expression based prediction of malignancies, starting from the dichotomic classification problem of normal versus tumoural tissues, to multiclasss cancer diagnosis and to functional class discovery and gene selection problems. The last part of this work present preliminary results about the applicatin of ensembles of SVMs based on bias-variance decomposition of the error to the analysis of gene expression data of malignant tissues
- …
