Search CORE

1,720,970 research outputs found

Statistical Learning Theory and ELM for Big Social Data Analysis

Author: Bisio Federica
Cambria Erik
Anguita Davide
Cambria Erik
Oneto Luca
ONETO LUCA
ANGUITA DAVIDE
Bisio Federica
Publication venue
Publication date: 01/01/2016
Field of study

The science of opinion analysis based on data from social networks and other forms of mass media has garnered the interest of the scientific community and the business world. Dealing with the increasing amount of information present on the Web is a critical task and requires efficient models developed by the emerging field of sentiment analysis. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how to exploit the most recent technological tools and advances in Statistical Learning Theory (SLT) in order to efficiently build an Extreme Learning Machine (ELM) and assess the resultant model's performance when applied to big social data analysis. ELM represents a powerful learning tool, developed to overcome some issues in back-propagation networks. The main problem with ELM is in training them to work in the event of a large number of available samples, where the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of the most recent advances in SLT in order to address the issue of selecting ELM hyperparameters that give the best generalization performance

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Genova

SIM-ELM: Connecting the ELM model with similarity-function learning

Author: GASTALDO PAOLO
DECHERCHI SERGIO
BISIO FEDERICA
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2016
Field of study

This paper moves from the affinities between two well-known learning schemes that apply randomization in the training process, namely, Extreme Learning Machines (ELMs) and the learning framework using similarity functions. These paradigms share a common approach involving data remapping and linear separators, but differ in the role of randomization within the respective learning algorithms. The paper presents an integrated approach connecting the two models, which ultimately yields a new variant of the basic ELM. The resulting learning scheme is characterized by an analytical relationship between the dimensionality of the remapped space and the learning abilities of the eventual predictor. Experimental results confirm that the new learning scheme can improve over conventional ELM in terms of the trade-off between classification accuracy and predictor complexity (i.e., the dimensionality of the remapped space)

Crossref

Archivio istituzionale della ricerca - Università di Genova

Machine Learning Techniques applied to Twitter Spammers Detection

Author: GASTALDO PAOLO
MEDA CLAUDIA
BISIO FEDERICA
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2014
Field of study

Every minute more than 320 new accounts are created on Twitter and more than 98,000 tweets are posted. Among the multitude of Twitter users, spammers and cybercriminals aim to pervade and strike legitimate users' accounts with a large amount of troublesome messages. Hence, the Social Network propagation opens new modalities for cyber-crime perpetration, while the spamming phenomenon exploits specific mechanism of messaging process. This research shows that Machine Learning (ML) may provide a powerful tool to support spammer detection in Twitter. The present paper compares the performance of three different ML algorithm in tackling this task. The experimental session involves a publicly available dataset

Archivio istituzionale della ricerca - Università di Genova

An ELM-based model for affective analogical reasoning

Author: GASTALDO PAOLO
Cambria Erik
BISIO FEDERICA
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2015
Field of study

Between the dawn of the Internet through year 2003, there were just a few dozens exabytes of information on the Web. Today, that much information is created weekly. The opportunity to capture the opinions of the general public about social events, political movements, company strategies, marketing campaigns, and product preferences has raised increasing interest both in the scientific community, for the exciting open challenges, and in the business world, for the remarkable fallouts in marketing and financial prediction. Keeping up with the ever-growing amount of unstructured information on the Web, however, is a formidable task and requires fast and efficient models for opinion mining. In this paper, we explore how the high generalization performance, low computational complexity, and fast learning speed of extreme learning machines can be exploited to perform analogical reasoning in a vector space model of affective common-sense knowledge. In particular, by enabling a fast reconfiguration of such a vector space, extreme learning machines allow the polarity associated with natural language concepts to be calculated in a more dynamic and accurate way and, hence, perform better concept-level sentiment analysis

Crossref

Archivio istituzionale della ricerca - Università di Genova

SLT-Based ELM for Big Social Data Analysis

Author: Anguita Davide
Bisio Federica
Oneto Luca
Cambria Erik
Publication venue
Publication date: 01/01/2017
Field of study

Recently, social networks and other forms of media communication have been gathering the interest of both the scientific and the business world, leading to the increasing development of the science of opinion and sentiment analysis. Facing the huge amount of information present on the Web represents a crucial task and leads to the study and creation of efficient models able to tackle the task. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how the most recent advances in statistical learning theory (SLT) can support the development of an efficient extreme learning machine (ELM) and the assessment of the resultant modelâs performance when applied to big social data analysis. ELM, developed to overcome some issues in back-propagation networks, represents a powerful learning tool. However, the main problem is represented by the necessity to cope with a large number of available samples, and the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of SLT results in order to select ELM hyperparameters able to provide the best generalization performance

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Genova

A learning scheme based on similarity functions for affective common-sense reasoning

Author: Rodolfo Zunino
GASTALDO PAOLO
Erik Cambria
Paolo Gastaldo
BISIO FEDERICA
Federica Bisio
Erik Cambria
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2015
Field of study

This paper explores the theory of learning with similarity functions in the context of common-sense reasoning and natural language processing. Based on this theory, the proposed approach (called Sim-Predictor) is characterized by the process of remapping the input space into a new space which is able to convey the similarity between the input pattern and a number of landmarks, i.e., a subset of patterns randomly extracted from the training set. The new learning scheme exhibits the interesting property of relating the dimensionality of the remapped space to the learning abilities of the eventual predictor in a formal fashion. The evaluation phase shows that Sim-Predictor compares positively with ELM and SVM, when addressing the problem of polarity detection in the sentic computing framework, a novel approach to big social data analysis based on the interpretation of the cognitive and affective information associated with natural language (affective common-sense reasoning)

Crossref

Archivio istituzionale della ricerca - Università di Genova

Inductive bias for semi-supervised extreme learning machine

Author: Rodolfo Zunino
GASTALDO PAOLO
DECHERCHI SERGIO
Paolo Gastaldo
Sergio Decherchi
BISIO FEDERICA
Federica Bisio
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2016
Field of study

This research shows that inductive bias provides a valuable method to effectively tackle semi-supervised classification problems. In the learning theory framework, inductive bias provides a powerful tool, and allows one to shape the generalization properties of a learning machine. The paper formalizes semisupervised learning as a supervised learning problem biased by an unsupervised reference solution. The resulting semi-supervised classification framework can apply any clustering algorithm to derive the reference function, thus ensuring maximum flexibility. In this context, the paper derives the biased version of Extreme Learning Machine (br-ELM). The experimental session involves several real world problems and proves the reliability of the semi-supervised classification scheme

Crossref

Archivio istituzionale della ricerca - Università di Genova

Semi-supervised Learning for Affective Common-Sense Reasoning

Author: Bisio Federica
Luca Oneto
Cambria Erik
Erik Cambria
ONETO LUCA
Federica Bisio
Davide Anguita
ANGUITA DAVIDE
Publication venue
Publication date: 01/01/2016
Field of study

Background: Big social data analysis is the area of research focusing on collecting, examining, and processing large multi-modal and multi-source datasets in order to discover patterns/correlations and extract information from the Social Web. This is usually accomplished through the use of supervised and unsupervised machine learning algorithms that learn from the available data. However, these are usually highly computationally expensive, either in the training or in the prediction phase, as they are often not able to handle current data volumes. Parallel approaches have been proposed in order to boost processing speeds, but this clearly requires technologies that support distributed computations. Methods: Extreme learning machines (ELMs) are an emerging learning paradigm, presenting an efficient unified solution to generalized feed-forward neural networks. ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. However, ELM cannot be easily parallelized, due to the presence of a pseudo-inverse calculation. Therefore, this paper aims to find a reliable method to realize a parallel implementation of ELM that can be applied to large datasets typical of Big Data problems with the employment of the most recent technology for parallel in-memory computation, i.e., Spark, designed to efficiently deal with iterative procedures that recursively perform operations over the same data. Moreover, this paper shows how to take advantage of the most recent advances in statistical learning theory (SLT) in order to address the issue of selecting ELM hyperparameters that give the best generalization performance. This involves assessing the performance of such algorithms (i.e., resampling methods and in-sample methods) by exploiting the most recent results in SLT and adapting them to the Big Data framework. The proposed approach has been tested on two affective analogical reasoning datasets. Affective analogical reasoning can be defined as the intrinsically human capacity to interpret the cognitive and affective information associated with natural language. In particular, we employed two benchmarks, each one composed by 21,743 common-sense concepts; each concept is represented according to two models of a semantic network in which common-sense concepts are linked to a hierarchy of affective domain labels. Results: The labeled data have been split into two sets: The first 20,000 samples have been used for building the model with the ELM with the different SLT strategies, while the rest of the labeled samples, numbering 1743, have been kept apart as reference set in order to test the performance of the learned model. The splitting process has been repeated 30 times in order to obtain statistically relevant results. We ran the experiments through the use of the Google Cloud Platform, in particular, the Google Compute Engine. We employed the Google Compute Engine Platform with NM = 4 machines with two cores and 1.8 GB of RAM (machine type n1-highcpu-2) and an HDD of 30 GB equipped with Spark. Results on the affective dataset both show the effectiveness of the proposed parallel approach and underline the most suitable SLT strategies for the specific Big Data problem. Conclusion: In this paper we showed how to build an ELM model with a novel scalable approach and to carefully assess the performance, with the use of the most recent results from SLT, for a sentiment analysis problem. Thanks to recent technologies and methods, the computational requirements of these methods have been improved to allow for the scaling to large datasets, which are typical of Big Data applications

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Genova

Machine Learning-Based System for Detecting Unseen Malicious Software

Author: Rodolfo Zunino
GASTALDO PAOLO
Claudia Meda
Stefano Nasta
MEDA CLAUDIA
Paolo Gastaldo
BISIO FEDERICA
Federica Bisio
NASTA STEFANO
ZUNINO RODOLFO
Publication venue
Publication date: 01/01/2016
Field of study

In the Internet age, malicious software (malware) represents a serious threat to the security of information systems. Malware-detection systems to protect computers must perform a real-time analysis of the executable files. The paper shows that machine-learning methods can support the challenging, yet critical, task of unseen malware recognition, i.e., the classification of malware variants that were not included in the training set. The experimental verification involved a publicly available dataset, and confirmed the effectiveness of the overall approach

Crossref

Archivio istituzionale della ricerca - Università di Genova

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)