1,721,043 research outputs found
On the use of summarization and transformer architectures for profiling résumés
Profiling professional figures is becoming more and more crucial, as companies and recruiters face the challenges of Industry 4.0. On the one hand, demand for specific knowledge in professional figures is rising. On the other hand, workers try to broaden the spectrum of their skills in order to remain appealing in the job market. Therefore, research related to these topics is receiving more and more attention. In this paper, we propose a methodology to profile résumés based on summarization and transformer architectures for generating résumé embeddings and on hierarchical clustering algorithms for grouping these embeddings. We evaluate different strategies and show that our approach achieves promising results on a public domain dataset containing 1202 résumés
Exploiting Categorization of Online News for Profiling City Areas
Profiling city areas, in terms of citizens' behaviour and commercial and social activities, is an interesting issue in the context of smart cities, especially considering a real-time streaming context. Several methods have been proposed in the literature, exploiting different data sources. In this paper, we propose an approach to perform profiling of city areas based on articles of local online newspapers, by exploiting information regarding the text as well as metadata such as geo-localization and tags. In particular, we use tags associated with each article for identifying macro-categories through clustering analysis on tags embeddings. Further, we employ a text categorization model based on SVM to label online a new article, represented as Bag-of-Words, with one of such categories. The categorization approach has been integrated into a framework recently proposed by the authors for profiling city areas exploiting different web sources of data: the online newspapers are monitored continuously, thus producing a news stream to be analysed. We show experiments performed on the city of Rome, considering data from 2014 to 2018. We discuss the results obtained by adopting different classifiers and present that the best classifier, namely an SVM, can achieve an accuracy and an f1-score up to 93% and 79%, respectively
A System to Support Readers in Automatically Acquiring Complete Summarized Information on an Event from Different Sources
Today, most newspapers utilize social media to disseminate news. On the one hand, this results in an overload of related articles for social media users. On the other hand, since social media tends to form echo chambers around their users, different opinions and information may be hidden. Enabling users to access different information (possibly outside of their echo chambers, without the burden of reading entire articles, often containing redundant information) may be a step forward in allowing them to form their own opinions. To address this challenge, we propose a system that integrates Transformer neural models and text summarization models along with decision rules. Given a reference article already read by the user, our system first collects articles related to the same topic from a configurable number of different sources. Then, it identifies and summarizes the information that differs from the reference article and outputs the summary to the user. The core of the system is the sentence classification algorithm, which classifies sentences in the collected articles into three classes based on similarity with the reference article: sentences classified as dissimilar are summarized by using a pre-trained abstractive summarization model. We evaluated the proposed system in two steps. First, we assessed its effectiveness in identifying content differences between the reference article and the related articles by using human judgments obtained through crowdsourcing as ground truth. We obtained an average F1 score of 0.772 against average F1 scores of 0.797 and 0.676 achieved by two state-of-the-art approaches based, respectively, on model tuning and prompt tuning, which require an appropriate tuning phase and, therefore, greater computational effort. Second, we asked a sample of people to evaluate how well the summary generated by the system represents the information that is not present in the article read by the user. The results are extremely encouraging. Finally, we present a use case
An analysis of boosted ensembles of binary fuzzy decision trees
Classification is a functionality that plays a central role in the development of modern expert systems, across a wide variety of application fields: using accurate, efficient, and compact classification models is often a prime requirement. Boosting (and AdaBoost in particular) is a well-known technique to obtain robust classifiers from properly-learned weak classifiers, thus it is particularly attracting in many practical settings. Although the use of traditional classifiers as base learners in AdaBoost has already been widely studied, the adoption of fuzzy weak learners still requires further investigations. In this paper we describe FDT-Boost, a boosting approach shaped according to the SAMME-AdaBoost scheme, which leverages fuzzy binary decision trees as multi-class base classifiers. Such trees are kept compact by constraining their depth, without lowering the classification accuracy. The experimental evaluation of FDT-Boost has been carried out using a benchmark containing eighteen classification datasets. Comparing our approach with FURIA, one of the most popular fuzzy classifiers, with a fuzzy binary decision tree, and with a fuzzy multi-way decision tree, we show that FDT-Boost is accurate, getting to results that are statistically better than those achieved by the other approaches. Moreover, compared to a crisp SAMME-AdaBoost implementation, FDT-Boost shows similar performances, but the relative produced models are significantly less complex, thus opening up further exploitation chances also in memory-constrained systems
A survey on fake news and rumour detection techniques
False or unverified information spreads just like accurate information on the web, thus possibly going viral and influencing the public opinion and its decisions. Fake news and rumours represent the most popular forms of false and unverified information, respectively, and should be detected as soon as possible for avoiding their dramatic effects. The interest in effective detection techniques has been therefore growing very fast in the last years. In this paper we survey the different approaches to automatic detection of fake news and rumours proposed in the recent literature. In particular, we focus on five main aspects. First, we report and discuss the various definitions of fake news and rumours that have been considered in the literature. Second, we highlight how the collection of relevant data for performing fake news and rumours detection is problematic and we present the various approaches, which have been adopted to gather these data, as well as the publicly available datasets. Third, we describe the features that have been considered in fake news and rumour detection approaches. Fourth, we provide a comprehensive analysis on the various techniques used to perform rumour and fake news detection. Finally, we identify and discuss future directions
An overview of recent distributed algorithms for learning fuzzy models in Big Data classification
Nowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability
Fuzzy hoeffding decision tree for data stream classification
Data stream mining has recently grown in popularity, thanks to an increasing number of applications which need continuous and fast analysis of streaming data. Such data are generally produced in application domains that require immediate reactions with strict temporal constraints. These particular characteristics make problematic the use of classical machine learning algorithms for mining knowledge from these fast data streams and call for appropriate techniques. In this paper, based on the well-known Hoeffding Decision Tree (HDT) for streaming data classification, we introduce FHDT, a fuzzy HDT that extends HDT with fuzziness, thus making HDT more robust to noisy and vague data. We tested FHDT on three synthetic datasets, usually adopted for analyzing concept drifts in data stream classification, and two real-world datasets, already exploited in some recent researches on fuzzy systems for streaming data. We show that FHDT outperforms HDT, especially in presence of concept drift. Furthermore, FHDT is characterized by a high level of interpretability, thanks to the linguistic rules that can be extracted from it
A data-driven approach to automatic extraction of professional figure profiles from Résumés
The process of selecting and interviewing suitable candidates for a job position is time-consuming and labour-intensive. Despite the existence of software applications aimed at helping professional recruiters in the process, only recently with Industry 4.0 there has been a real interest in implementing autonomous and data-driven approaches that can provide insights and practical assistance to recruiters. In this paper, we propose a framework that is aimed at improving the performances of an Applicant Tracking System. More specifically, we exploit advanced Natural Language Processing and Text Mining techniques to automatically profile resources (i.e. candidates for a job) and offers by extracting relevant keywords and building a semantic representation of résumés and job opportunities
SK-MOEFS: A Library in Python for Designing Accurate and Explainable Fuzzy Models
Recently, the explainability of Artificial Intelligence (AI) models and algorithms is becoming an important requirement in real-world applications. Indeed, although AI allows us to address and solve very difficult and complicated problems, AI-based tools act as a black box and, usually, do not explain how/why/when a specific decision has been taken. Among AI models, Fuzzy Rule-Based Systems (FRBSs) are recognized world-wide as transparent and interpretable tools: they can provide explanations in terms of linguistic rules. Moreover, FRBSs may achieve accuracy comparable to those achieved by less transparent models, such as neural networks and statistical models. In this work, we introduce SK-MOEFS (acronym of SciKit-Multi Objective Evolutionary Fuzzy System), a new Python library that allows the user to easily and quickly design FRBSs, employing Multi-Objective Evolutionary Algorithms. Indeed, a set of FRBSs, characterized by different trade-offs between their accuracy and their explainability, can be generated by SK-MOEFS. The user, then, will be able to select the most suitable model for his/her specific application
A Multiobjective Evolutionary Approach to Concurrently Learn Rule and Data Bases of Linguistic Fuzzy-Rule-Based Systems
- …
