1,720,970 research outputs found
Statistical Learning Theory and ELM for Big Social Data Analysis
The science of opinion analysis based on data from social networks and other forms of mass media has garnered the interest of the scientific community and the business world. Dealing with the increasing amount of information present on the Web is a critical task and requires efficient models developed by the emerging field of sentiment analysis. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how to exploit the most recent technological tools and advances in Statistical Learning Theory (SLT) in order to efficiently build an Extreme Learning Machine (ELM) and assess the resultant model's performance when applied to big social data analysis. ELM represents a powerful learning tool, developed to overcome some issues in back-propagation networks. The main problem with ELM is in training them to work in the event of a large number of available samples, where the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of the most recent advances in SLT in order to address the issue of selecting ELM hyperparameters that give the best generalization performance
SIM-ELM: Connecting the ELM model with similarity-function learning
This paper moves from the affinities between two well-known learning schemes that apply randomization in the training process, namely, Extreme Learning Machines (ELMs) and the learning framework using similarity functions. These paradigms share a common approach involving data remapping and linear separators, but differ in the role of randomization within the respective learning algorithms. The paper presents an integrated approach connecting the two models, which ultimately yields a new variant of the basic ELM. The resulting learning scheme is characterized by an analytical relationship between the dimensionality of the remapped space and the learning abilities of the eventual predictor. Experimental results confirm that the new learning scheme can improve over conventional ELM in terms of the trade-off between classification accuracy and predictor complexity (i.e., the dimensionality of the remapped space)
Machine Learning Techniques applied to Twitter Spammers Detection
Every minute more than 320 new accounts are created on Twitter and more than 98,000 tweets are
posted. Among the multitude of Twitter users, spammers and cybercriminals aim to pervade and strike
legitimate users' accounts with a large amount of troublesome messages. Hence, the Social Network
propagation opens new modalities for cyber-crime perpetration, while the spamming phenomenon exploits
specific mechanism of messaging process. This research shows that Machine Learning (ML) may provide a
powerful tool to support spammer detection in Twitter. The present paper compares the performance of three
different ML algorithm in tackling this task. The experimental session involves a publicly available dataset
An ELM-based model for affective analogical reasoning
Between the dawn of the Internet through year 2003, there were just a few dozens exabytes of information on the Web. Today, that much information is created weekly. The opportunity to capture the opinions of the general public about social events, political movements, company strategies, marketing campaigns, and product preferences has raised increasing interest both in the scientific community, for the exciting open challenges, and in the business world, for the remarkable fallouts in marketing and financial prediction. Keeping up with the ever-growing amount of unstructured information on the Web, however, is a formidable task and requires fast and efficient models for opinion mining. In this paper, we explore how the high generalization performance, low computational complexity, and fast learning speed of extreme learning machines can be exploited to perform analogical reasoning in a vector space model of affective common-sense knowledge. In particular, by enabling a fast reconfiguration of such a vector space, extreme learning machines allow the polarity associated with natural language concepts to be calculated in a more dynamic and accurate way and, hence, perform better concept-level sentiment analysis
SLT-Based ELM for Big Social Data Analysis
Recently, social networks and other forms of media communication have been gathering the interest of both the scientific and the business world, leading to the increasing development of the science of opinion and sentiment analysis. Facing the huge amount of information present on the Web represents a crucial task and leads to the study and creation of efficient models able to tackle the task. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how the most recent advances in statistical learning theory (SLT) can support the development of an efficient extreme learning machine (ELM) and the assessment of the resultant modelâs performance when applied to big social data analysis. ELM, developed to overcome some issues in back-propagation networks, represents a powerful learning tool. However, the main problem is represented by the necessity to cope with a large number of available samples, and the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of SLT results in order to select ELM hyperparameters able to provide the best generalization performance
A learning scheme based on similarity functions for affective common-sense reasoning
This paper explores the theory of learning with
similarity functions in the context of common-sense reasoning and
natural language processing. Based on this theory, the proposed
approach (called Sim-Predictor) is characterized by the process
of remapping the input space into a new space which is able to
convey the similarity between the input pattern and a number
of landmarks, i.e., a subset of patterns randomly extracted from
the training set. The new learning scheme exhibits the interesting
property of relating the dimensionality of the remapped space
to the learning abilities of the eventual predictor in a formal
fashion. The evaluation phase shows that Sim-Predictor compares
positively with ELM and SVM, when addressing the problem of
polarity detection in the sentic computing framework, a novel
approach to big social data analysis based on the interpretation
of the cognitive and affective information associated with natural
language (affective common-sense reasoning)
Inductive bias for semi-supervised extreme learning machine
This research shows that inductive bias provides a valuable method to effectively tackle semi-supervised classification problems. In the learning theory framework, inductive bias provides a powerful tool, and allows one to shape the generalization properties of a learning machine. The paper formalizes semisupervised learning as a supervised learning problem biased by an unsupervised reference solution. The resulting semi-supervised classification framework can apply any clustering algorithm to derive the reference function, thus ensuring maximum flexibility. In this context, the paper derives the biased version of Extreme Learning Machine (br-ELM). The experimental session involves several real world problems and proves the reliability of the semi-supervised classification scheme
Semi-supervised Learning for Affective Common-Sense Reasoning
Background: Big social data analysis is the area of research focusing on collecting, examining, and processing large multi-modal and multi-source datasets in order to discover patterns/correlations and extract information from the Social Web. This is usually accomplished through the use of supervised and unsupervised machine learning algorithms that learn from the available data. However, these are usually highly computationally expensive, either in the training or in the prediction phase, as they are often not able to handle current data volumes. Parallel approaches have been proposed in order to boost processing speeds, but this clearly requires technologies that support distributed computations. Methods: Extreme learning machines (ELMs) are an emerging learning paradigm, presenting an efficient unified solution to generalized feed-forward neural networks. ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. However, ELM cannot be easily parallelized, due to the presence of a pseudo-inverse calculation. Therefore, this paper aims to find a reliable method to realize a parallel implementation of ELM that can be applied to large datasets typical of Big Data problems with the employment of the most recent technology for parallel in-memory computation, i.e., Spark, designed to efficiently deal with iterative procedures that recursively perform operations over the same data. Moreover, this paper shows how to take advantage of the most recent advances in statistical learning theory (SLT) in order to address the issue of selecting ELM hyperparameters that give the best generalization performance. This involves assessing the performance of such algorithms (i.e., resampling methods and in-sample methods) by exploiting the most recent results in SLT and adapting them to the Big Data framework. The proposed approach has been tested on two affective analogical reasoning datasets. Affective analogical reasoning can be defined as the intrinsically human capacity to interpret the cognitive and affective information associated with natural language. In particular, we employed two benchmarks, each one composed by 21,743 common-sense concepts; each concept is represented according to two models of a semantic network in which common-sense concepts are linked to a hierarchy of affective domain labels. Results: The labeled data have been split into two sets: The first 20,000 samples have been used for building the model with the ELM with the different SLT strategies, while the rest of the labeled samples, numbering 1743, have been kept apart as reference set in order to test the performance of the learned model. The splitting process has been repeated 30 times in order to obtain statistically relevant results. We ran the experiments through the use of the Google Cloud Platform, in particular, the Google Compute Engine. We employed the Google Compute Engine Platform with NM = 4 machines with two cores and 1.8 GB of RAM (machine type n1-highcpu-2) and an HDD of 30 GB equipped with Spark. Results on the affective dataset both show the effectiveness of the proposed parallel approach and underline the most suitable SLT strategies for the specific Big Data problem. Conclusion: In this paper we showed how to build an ELM model with a novel scalable approach and to carefully assess the performance, with the use of the most recent results from SLT, for a sentiment analysis problem. Thanks to recent technologies and methods, the computational requirements of these methods have been improved to allow for the scaling to large datasets, which are typical of Big Data applications
Machine Learning-Based System for Detecting Unseen Malicious Software
In the Internet age, malicious software (malware) represents a serious threat to the security of information systems. Malware-detection systems to protect computers must perform a real-time analysis of the executable files. The paper shows that machine-learning methods can support the challenging, yet critical, task of unseen malware recognition, i.e., the classification of malware variants that were not included in the training set. The experimental verification involved a publicly available dataset, and confirmed the effectiveness of the overall approach
- …
