1,721,033 research outputs found
Single-linkage clustering for optimal classification in piecewise affine regression
When performing regression with piecewise affine maps, the most challenging task is to classify the data points, i.e. to correctly attribute a data point to the affine submodel that most likely generated it. In this paper, we consider a regression scheme similar to the one proposed in (Ferrari-Trecate et al., 2001,2003) that reduces the classification step to a clustering problem in presence of outliers. However, instead of the K-means procedure adopted in (Ferrari-Trecate et al., 2001,2003), we propose the use of single-linkage clustering that estimates automatically the number of submodels composing the piecewise affine map. Moreover we prove that, under mild assumptions on the data set, single-linkage clustering can guarantee optimal classification in presence of bounded noise
Bagged ensembles of Support Vector Machines for gene expression data analysis
Extracting information from gene expression data is a difficult task, as these data are characterized by very high dimensional, small sized, samples and large degree of biological variability. However, a possible way of dealing with the curse of dimensionality is offered by feature selection algorithms, while variance problems arising from small samples and biological variability can be addressed through ensemble methods based on resampling techniques.
These two approaches have been combined to improve the accuracy of Support Vector Machines (SVM) in the classification of malignant tissues from DNA microarray data. To assess the accuracy and the confidence of the predictions performed proper measures have been introduced. Presented results show that bagged ensembles of SVM are more reliable and achieve equal or better classification accuracy with respect to single SVM, whereas feature selection methods can further enhance classification accuracy
Cancer recognition with bagged ensembles of Support Vector Machines
Expression-based classification of tumors requires stable, reliable and variance reduction methods, as DNA microarray data are
characterized by low size, high dimensionality, noise and large biological variability. In order to address the variance and curse of dimensionality problems arising from this difficult task, we propose to apply bagged ensembles of Support Vector Machines (SVM) and feature
selection algorithms to the recognition of malignant tissues. Presented results show that bagged ensembles of SVMs are more reliable and achieve equal or better classification accuracy with respect to single SVMs, whereas feature selection methods can further enhance classification accuracy
The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer
Modeling gene expression data via positive Boolean functions
In this work we propose an artificial model for the generation
of biologically plausible gene expression data to be used in the evaluation of the performance of gene selection and clustering methods.
The model allows to fix in advance the set of relevant genes and the
functional classes involved in the problem; the input-output relationship
is constructed by synthesizing a positive Boolean function. Despite its
simplicity, it is sufficiently rich to take account of the specific peculiarities of gene expression data, including biological variability.
A Java code had been developed to allow the user choose the model parameters according to the characteristics of the experiment he want to simulate. This permits to insert the artificial model into a distributed system for microarray analysis, in particular one based on a Grid infrastructure
Evaluation of gene selection methods through artificial and real-world data concerning DNA microarray experiments
A New Learning Method for Piecewise Linear Regression
A new connectionist model for the solution of piecewise lin- ear regression problems is introduced; it is able to reconstruct both con- tinuous and non continuous real valued mappings starting from a finite set of possibly noisy samples. The approximating function can assume a different linear behavior in each region of an unknown polyhedral parti- tion of the input domain. The proposed learning technique combines local estimation, clustering in weight space, multicategory classification and linear regression in order to achieve the desired result. Through this approach piecewise affine solutions for general nonlinear regression problems can also be found
Proceedings Sixth Annual Meeting of the Bioinformatics Italian Society BITS 2009, Genova, 10-13 March 2009
Approximate Dynamic Programming with Bounds on Model Complexity and Sample Complexity: An Application to an Inventory Forecasting Problem
- …
