1,721,143 research outputs found
An analysis of boosted ensembles of binary fuzzy decision trees
Classification is a functionality that plays a central role in the development of modern expert systems, across a wide variety of application fields: using accurate, efficient, and compact classification models is often a prime requirement. Boosting (and AdaBoost in particular) is a well-known technique to obtain robust classifiers from properly-learned weak classifiers, thus it is particularly attracting in many practical settings. Although the use of traditional classifiers as base learners in AdaBoost has already been widely studied, the adoption of fuzzy weak learners still requires further investigations. In this paper we describe FDT-Boost, a boosting approach shaped according to the SAMME-AdaBoost scheme, which leverages fuzzy binary decision trees as multi-class base classifiers. Such trees are kept compact by constraining their depth, without lowering the classification accuracy. The experimental evaluation of FDT-Boost has been carried out using a benchmark containing eighteen classification datasets. Comparing our approach with FURIA, one of the most popular fuzzy classifiers, with a fuzzy binary decision tree, and with a fuzzy multi-way decision tree, we show that FDT-Boost is accurate, getting to results that are statistically better than those achieved by the other approaches. Moreover, compared to a crisp SAMME-AdaBoost implementation, FDT-Boost shows similar performances, but the relative produced models are significantly less complex, thus opening up further exploitation chances also in memory-constrained systems
Integration of web-scraped data in cpm tools: The case of project Sibilla
Modern corporate performance management (CPM) systems are crucial tools for enterprises, but they typically lack a seamless integration with solutions in the Industry 4.0 domain for the exploitation of large amounts of data originated outside the enterprise boundaries. In this paper, we propose a solution to this problem, according to lessons learned in the development of project “Sibilla,” aimed at devising innovative tools in the business intelligence area. A proper software module is introduced with the purpose of enriching existing predictive analysis models with knowledge extracted from the Web and social networks. In particular, we describe how to support two functionalities: identification of planned real-world events and monitoring of public opinion on topics of interest to the company. The effectiveness of the proposed solution has been evaluated by means of a long-term experimental campaign
Comparing ensemble strategies for deep learning: An application to facial expression recognition
Recent works have shown that Convolutional Neural Networks (CNNs), because of their effectiveness in feature extraction and classification tasks, are suitable tools to address the Facial Expression Recognition (FER) problem. Further, it has been pointed out how ensembles of CNNs allow improving classification accuracy. Nevertheless, a detailed experimental analysis on how ensembles of CNNs could be effectively generated in the FER context has not been performed yet, although it would have considerable value for improving the results obtained in the FER task. This paper aims to present an extensive investigation on different aspects of the ensemble generation, focusing on the factors that influence the classification accuracy on the FER context. In particular, we evaluate several strategies for the ensemble generation, different aggregation schemes, and the dependence upon the number of base classifiers in the ensemble. The final objective is to provide some indications for building up effective ensembles of CNNs. Specifically, we observed that exploiting different sources of variability is crucial for the improvement of the overall accuracy. To this aim, pre-processing and pre-training procedures are able to provide a satisfactory variability across the base classifiers, while the use of different seeds does not appear as an effective solution. Bagging ensures a high ensemble gain, but the overall accuracy is limited by poor-performing base classifiers. The impact of increasing the ensemble size specifically depends on the adopted strategy, but also in the best case the performance gain obtained by involving additional base classifiers becomes not significant beyond a certain limit size, thus suggesting to avoid very large ensembles. Finally, the classic averaging voting proves to be an appropriate aggregation scheme, achieving accuracy values comparable to or slightly better than the other experimented operators
Addressing Event-Driven Concept Drift in Twitter Stream: A Stance Detection Application
The content posted by users on Social Networks represents an important source of information for a myriad of applications in the wide field known as 'social sensing'. The Twitter platform in particular hosts the thoughts, opinions and comments of its users, expressed in the form of tweets: as a consequence, tweets are often analyzed with text mining and natural language processing techniques for relevant tasks, ranging from brand reputation and sentiment analysis to stance detection. In most cases the intelligent systems designed to accomplish these tasks are based on a classification model that, once trained, is deployed into the data flow for online monitoring. In this work we show how this approach turns out to be inadequate for the task of stance detection from tweets. In fact, the sequence of tweets that are collected everyday represents a data stream. As it is well known in the literature on data stream mining, classification models may suffer from concept drift, i.e. a change in the data distribution can potentially degrade the performance. We present a broad experimental campaign for the case study of the online monitoring of the stance expressed on Twitter about the vaccination topic in Italy. We compare different learning schemes and propose yet a novel one, aimed at addressing the event-driven concept drift
A Fuzzy Density-based Clustering Algorithm for Streaming Data
The exploitation of data streams, nowadays provided nonstop by a myriad of diverse applications, asks for specific analysis methods. In this paper, we propose SF-DBSCAN, a fuzzy version of the DBSCAN algorithm, aimed to perform unsupervised analysis of streaming data. Fuzziness is introduced by fuzzy borders of density-based clusters. We describe and discuss the proposed algorithm, which evolves the clusters at each occurrence of a new object. Three synthetic datasets are used to show the ability of SF-DBSCAN to successfully track changes of data distribution, thus properly addressing concept drift. SF-DBSCAN is compared with a basic, crisp streaming version of DBSCAN with regard to modelling effectiveness
Document Management for Collaborative E-business: Integrating ebXML Environment and Legacy DMS Barcelona, Spain,
FDBSCAN-APT: A fuzzy density-based clustering algorithm with automatic parameter tuning
Density-based clustering algorithms represent a convenient approach when the number of clusters is not known in advance and their shapes are arbitrary. Nevertheless, they are highly sensitive to the input parameter setting, especially when clusters' borders are close to each other, or even overlap. In this paper we propose FDBSCAN-APT, a fuzzy extension of the DBSCAN algorithm. FDBSCAN-APT is able to discover clusters with fuzzy overlapping borders and relies on the automatic setting of input parameters thanks to the definition of a novel heuristic based on the statistical modelling of the density distribution of objects. An extensive experimental analysis carried out on synthetic datasets shows that FDBSCAN-APT always finds reasonable parameter configurations and produces good clustering results in a variety of challenging scenarios
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Enabling the E@syCare Telemedicine Platform with Push Notification with End-to-end Acknowledgment
Telemedicine is becoming increasingly important in recent years, especially in chronic diseases treatment. Thanks to these platforms, it is possible to assist patients remotely, being continuously monitored according to personalized care plans as if they were at the hospital. The characteristics of telemedicine platforms resulted extremely useful when Covid-19 pandemic broke out. Intense monitoring of patients at home, social distancing, and resource rationalization provided great help to medical personnel and healthcare systems. This novel disease, however, has posed new challenges, given by the quicker evolution of a patient's clinical status with respect to chronic diseases. In particular, the updates of the care plan performed remotely by doctors need to be immediately delivered to monitoring kits located at the patient's home, in order to adjust the monitoring plan of the target patient. Since this update is a critical operation, acknowledgment strategies are required to guarantee feedbacks upon delivery. Immediate updates can be achieved via a push notification system. In this paper, we present a push notification system based on HiveMQ Community Edition message broker, that provides end-to-end positive (ACK) and negative (NACK) acknowledgments, strict authentication, and authorization of users and messages, security, data consistency, and privacy. The realized system has been integrated and tested in the E@syCare telemedicine platform certified as a medical device, but it can be easily adopted by any other telemedicine solutions, as long as they can perform web service requests to the authentication server and integrate HiveMQ client library in their software components
- …
