1,721,037 research outputs found
DisRFC: a dissimilarity-based Random Forest Clustering approach
In this paper we present a novel Random Forest Clustering approach, called Dissimilarity Random Forest Clustering (DisRFC), which requires in input only pairwise dissimilarities. Thanks to this characteristic, the proposed approach is appliable to all those problems which involve non-vectorial representations, such as strings, sequences, graphs or 3D structures. In the proposed approach, we first train an Unsupervised Dis-similarity Random Forest (UD-RF), a novel variant of Random Forest which is completely unsupervised and based on dissimilarities. Then, we exploit the trained UD-RF to project the patterns to be clustered in a binary vectorial space, where the clustering is finally derived using fast and effective K-means procedures. In the paper we introduce different variants of DisRFC, which are thoroughly and positively evaluated on 12 different problems, also in comparison with alternative state-of-the-art approaches.(c) 2022 Elsevier Ltd. All rights reserved
Dissimilarity Random Forest Clustering
In this paper we present DisRFC (Dissimilarity Random Forest Clustering), a novel Random Forest Clustering approach which, contrarily to current methods which require in input a vectorial representation, works only with dissimilarities, thus being applicable also to all those problems where a vectorial representation is not available but a descriptive dissimilarity measure can be computed. In the DisRFC approach objects to be clustered are first modelled with a novel RF variant called Unsupervised Dissimilarity Random Forest (UD-RF), which functioning mechanisms are both unsupervised and based on dissimilarities. The trained UD-RF is then used to project objects in a binary vectorial space, where effective K-means procedures can be used to obtain the final clustering. In the paper we present different variants of DisRFC, thoroughly and positively evaluated using 10 different problems
On the importance of local and global analysis in the judgment of similarity and dissimilarity of faces
Distance-Based Random Forest Clustering with~Missing Data
In recent years there has been an increased interest in clustering methods based on Random Forests, due to their flexibility and their capability in describing data. One problem of current RF-clustering approaches is that they are not able to directly deal with missing data, a common scenario in many application fields (e.g. Bioinformatics): the usual solution in this case is to pre-impute incomplete data before running standard clustering methods. In this paper we present the first Random Forest clustering approach able to directly deal with missing data. We start from the very recent RatioRF distance for clustering [3], which has shown to outperform all other distance-based RF clustering schemes, extending the framework in two directions, which allow the integration of missing data mechanisms directly inside the clustering pipeline. Experimental results, based on 6 standard UCI ML datasets, are promising, also in comparison with some literature alternatives
Probabilistic face authentication using Hidden Markov Models
In this paper a novel approach for face authentication is proposed, based on the Hidden Markov Model (HMM) tool. While this technique has been largely and successfully employed in face recognition systems, its use in the authentication context has poorly been investigated. The method proposed in this paper extracts from the image a sequence of partially overlapped images, from which different kinds of simple and quickly computable features are extracted. The face template is obtained by modelling the sequence with a continuous Gaussian Hidden Markov Model. Given an unknown subject, the authentication phase is carried out by thresholding the likelihood of the given face with respect to the HMM template. The proposed approach has been thoroughly tested on the ORL database, also applying different parameters' configurations. A comparison with two other state-of-the-art approaches is also reported. The results obtained are really promising, showing the wide applicability of the Hidden Markov Models methodology
On learning Random Forests for Random Forest-clustering
In this paper we study the poorly investigated problem of learning Random Forests for distance-based Random Forest clustering. We studied both classic schemes as well as alternative approaches, novel in this context. In particular, we investigated the suitability of Gaussian Density Forests [1], Random Forests specifically designed for density estimation. Further, we introduce a novel variant of Random Forest, based on an effective non parametric by-pass estimator of the Renyi entropy, which can be useful when the parametric assumption is too strict. An empirical evaluation involving different datasets and different RF-clustering strategies confirms that the learning step is crucial for RF-clustering. We also present a set of practical guidelines useful to determine the most suitable variant of RF-clustering according to the problem under examination
Surfaces engineering approaches for cell culture substrates using biomimetic Human Elastin-Like Polypeptides
Spatially controlled cell adhesion and multicellular organization is critical to many biomedical and tissue-engineering applications. Many efforts have therefore focused on the production of engineered surfaces that can promote, and possibly control, cell adhesion and growth. Here we present the characterization of these spontaneously formed patterns using HELP-fluoresceinated derivatives. The ability to support cell adhesion and growth of the HELP-based substrates is also explored
Using Random Forest Distances for Outlier Detection
In recent years, a great variety of outlier detectors have been proposed in the literature, many of which are based on pairwise distances or derived concepts. However, in such methods, most of the efforts have been devoted to the outlier detection mechanisms, not paying attention to the distance measure - in most cases the basic Euclidean distance is used. Instead, in the clustering field, data-dependent measures have shown to be very useful, especially those based on Random Forests: actually, Random Forests are partitioners of the space able to naturally encode the relation between two objects. In the outlier detection field, these informative distances have received scarce attention. This manuscript is aimed at filling this gap, studying the suitability of these measures in the identification of outliers. In our scheme, we build an unsupervised Random Forest model, from which we extract pairwise distances; these distances are then input to an outlier detector. In particular, we study the impact of several Random Forest-based distances, including advanced and recent ones, on different outlier detectors. We evaluate thoroughly our methodology on nine benchmark datasets for outlier detection, focusing on different aspects of the pipeline, such as the parametrization of the forest, the type of distance-based outlier detector, and most importantly, the impact of the adopted distance
- …
