Search CORE

1,721,009 research outputs found

OPAL at SemEval Task 4: the Challenge of Porting a Sentiment Analysis System to the "Real" World

Author: BALAHUR-DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2016
Field of study

Sentiment analysis has become a well-established task in Natural Language Pro-cessing. As such, a high variety of methods have been proposed to tackle it, for different types of texts, text levels, languages, domains and formality levels. Although state-of-the-art systems have obtained promising results, a big challenge that still remains is to port the systems to the “real world” – i.e. to implement systems that are running around the clock, dealing with information of heterogeneous na-ture, from different domains, written in differ-ent styles and diverse in formality levels. The present paper describes our efforts to imple-ment such a system, using a variety of strate-gies to homogenize the input and comparing various approaches to tackle the task. Specifi-cally, we are tackling the task using two dif-ferent approaches: a) one that is unsu-pervised, based on dictionaries of sentiment-bearing words and heuristics to compute final polarity of the text considered; b) the second, supervised, trained on previously annotated data from different domains. For both ap-proaches, the data is first normalized and the slang is replaced with its expanded version.JRC.E.1 - Disaster Risk Managemen

JRC Publications Repository

WASSA 2012 - Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

Author: BALAHUR DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2012
Field of study

In the past years, the quantity of contents generated by users on the Web, in social networking sites, fora and microblogs has reached an unprecedented level. All this data adds on to the contents generated in traditional media, such as newspapers, bringing additional factual, as well as a high quantity of opinionated and subjective information. In the context of the society in which we live, where sifting through the immense quantities of information to gather knowledge has become a must, the challenge of processing opinionated and subjective information is becoming more and more a focus to the Natural Language Processing (NLP) research communities worldwide. In the past decade, the interest in proposing computational methods to deal with subjectivity and sentiment in text has grown constantly from the NLP community. However, although the subjectivity and sentiment analysis research fields have been highly dynamic in this period, much remains still to be done, so that systems dealing with subjectivity, sentiment and, more generally, affect in text, can be reliably used in critical decision-making environments. Moreover, the new means of communication and user connection, in microblogs and social networks, become more and more relevant to these two tasks, as the contexts (internal and external) of the information communication process bring about new challenges and applications to be explored. Inspired by the above-mentioned issues and the objectives we aimed at in the first two editions of the Workshop on Computational Approaches to Subjectivity Analysis (WASSA 2010 and WASSA 2.011), the purpose of the third edition of the Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2012) was to create a framework for presenting and discussing the challenges related to subjectivity and sentiment analysis in NLP and its applications, in traditional and Social Media contexts, from an interdisciplinary theoretical and practical perspective. WASSA 2012 was organized in conjunction to the 50th Annual Meeting of the Association for Computational Linguistics, on July 12, 2012, in Jeju, Korea.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

Extending the EmotiNet Knowledge Base to Improve the Automatic Detection of Implicitly Expressed Emotions from Text

Author: BALAHUR DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2012
Field of study

Sentiment analysis is one of the recent, highly dynamic fields in Natural Language Processing. Most existing approaches are based on word-level analysis of texts and are mostly able to detect only explicit expressions of sentiment. However, in many cases, emotions are not expressed by using words with an affective meaning (e.g. happy), but by describing real-life situations, which readers (based on their commonsense knowledge) detect as being related to a specic emotion. Given the challenges of detecting emotions from contexts in which no lexical clue is present, in this article we present a comparative analysis between the performance of well-established methods for emotion detection (supervised and lexical knowledge-based) and a method we propose and extend, which is based on commonsense knowledge stored in the EmotiNet knowledge base. Our extensive evaluations show that, in the context of this task, the approach based on EmotiNet is the most appropriate.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

OPTWIMA: Comparing Knowledge-rich and Knowledge-poor Approaches for Sentiment Analysis in Short Informal Texts

Author: BALAHUR DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2013
Field of study

The fast development of Social Media made it possible for people to no loger remain mere spectators to the events that happen in the world, but become part of them, commenting on their developments and the entities involved, sharing their opinions and distributing related content. This phenomenon is of high importance to news monitoring systems, whose aim is to obtain an informative snapshot of media events and related comments. This paper presents the strategies employed in the OPTWIMA participation to SemEval 2013 Task 2-Sentiment Analysis in Twitter. The main goal was to evaluate the best settings for a sentiment analysis component to be added to the online news monitoring system. We describe the approaches used in the competition and the additional experiments performed combining different datasets for training, using or not slang replacement and generalizing sentiment-bearing terms by replacing them with unique labels. The results regarding tweet classification are promising and show that sentiment generalization can be an effective approach for tweets and that SMS language is difficult to tackle, even when specific normalization resources are employed.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

The Challenge of Processing Opinions in Online Contents in the Social Web Era

Author: BALAHUR DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2012
Field of study

In the past years, the NLP community has been increasingly interested in the field of opinion mining (also known as sentiment analysis), whose aim is to retrieve and classify the opinions expressed in text. Online reputation management, as a related task, is more focused on opinions on individuals and other entities. Additionally, the computational task of online reputation management also considers the analysis of facts that influence the status quo of these entities. The problem in this context is much more difficult to solve, as entities, as opposed to products, are related to different events and topics and there is no fixed set of “attributes” that are commented on by persons expressing opinions on these entities. There is only one freely accessible system performing such as a task - Lydia (Skiena et al., 2007), which gathers news from portals and blogs and classifies opinions on different entities. However, both this system, as well as different approaches that have been presented for this problem in the research literature, show that the issue of entity-centered opinion mining and, additionally, the correlation of the results with facts over events where these entities are involved are not trivial (Balahur and Steinberger, 2009; Zhang and Skiena, 2010). The present position paper studies the challenges related to the field of online reputation management and suggests possible solutions.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

Sentiment Analysis in Social Media Texts

Author: BALAHUR DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2013
Field of study

This paper presents a method for sentiment analysis specifically designed to work with Twitter data (tweets), taking into account their structure, length and specific language. The approach employed makes it easily extendible to other languages and makes it able to process tweets in near real time. The main contributions of this work are: a) the pre-processing of tweets to normalize the language and generalize the vocabulary employed to express sentiment; b) the use minimal linguistic processing, which makes the approach easily portable to other languages; c) the inclusion of higher order n-grams to spot modifications in the polarity of the sentiment expressed; d) the use of simple heuristics to select features to be employed; e) the application of supervised learning using a simple Support Vector Machines linear classifier on a set of realistic data. We show that using the training models generated with the method described we can improve the sentiment classification performance, irrespective of the domain and distribution of the test sets.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

Methods and resources for sentiment analysis in multilingual documents of different text types

Author: Balahur Dobrescu Alexandra
Publication venue
Publication date: 01/01/2011
Field of study

RUa Reposity University of Alicante

Análisis comparativo de métodos para determinar la polaridad de opiniones sobre productos

Author: Balahur Dobrescu Alexandra
Montoyo Andres
Publication venue
Publication date: 2008
Field of study

La gran cantidad de opiniones que los usuarios emiten sobre las características de los productos en blogs, foros y en documentos en internet, son de gran ayuda para los posibles compradores o para las compañías que los producen. Sin embargo, determinar de forma automática si un usuario tiene una opinión positiva o negativa de las características de un producto o del propio producto es un problema complejo que requiere de varios pasos para su resolución. Inicialmente hay que identificar las características del producto, extraer los términos que expresan la opinión del usuario y finalmente clasificar el producto de forma positiva o negativa. Este artículo describe un método para resumir los comentarios positivos o negativos sobre el producto a partir de las opiniones que los usuarios expresan a través de las características de los productos. Este problema se resuelve utilizando varias aproximaciones. Inicialmente se utilizan las palabras que aparecen en WordNet Affect (Strapparava and Valitutti, 2004) que expresan sentimiento. Finalmente se utiliza el método de aprendizaje automático (Support Vector Machines Sequential Minimal Optimization (Platt, 1998)) aplicado a las medidas de similitud denominadas Normalized Google Distance (Cilibrasi and Vitanyi, 2006) y Latent Semantic Analysis (Deerwester et al., 1990). Los resultados obtenidos por estas medidas de similitud se comparan, para posteriormente ser analizados y presentar las ventajas y los inconvenientes cuando se aplican al sistema de minería y resúmenes de opiniones.The high volume of user feedback on products under the form of reviews and forum or blog posts is helpful both to prospective buyers, as well as to producer companies. However, automatically determining the semantic orientation of the opinions expressed on different products and their features is a complex problem, requiring a series of steps: identifying the product features, extracting the opinion words present in a text and finally classifying them as positive or negative. This article concentrates on three approaches to solving the latter problem. One method employed determines polarity of the opinions expressed on the product features using on the one hand the sentiment bearing words in WordNet Affect (Strapparava and Valitutti, 2004). Two other methods explored involved determining the polarity of opinion holders (feature attributes) using Support Vector Machines Sequential Minimal Optimization (Platt, 1998) machine learning with the Normalized Google Distance (Cilibrasi and Vitanyi, 2006) and, respectively, with Latent Semantic Analysis (Deerwester et al., 1990) on a specialized versus a non-specialized corpus of user reviews. We comparatively analyze the methods, show the advantages and disadvantages resulted from using each of them and the results obtained by performing an evaluation on our opinion mining and summarization system

RUa Reposity University of Alicante

Detecting Entity-Related Events and Sentiments from Tweets Using Multilingual Resources

Author: BALAHUR DOBRESCU Alexandra
TANEV Hristo
Publication venue
Publication date: 01/01/2012
Field of study

This article presents the details of the participation of the OPTAH team to the CLEF 2012 RepLab profiling (polarity classification) and monitoring tasks. Specifically, we present the manner in which the OPAL system has been modified to deal with opinions in tweets and how the use of rules involving the language use in social-media can help to achieve good results as far as polarity classification is concerned, even in a language for which we have just a small polarity lexicon. Additionally, we show how we can employ the values computed for sentiment intensity (especially the negative ones) to classify the importance of event-related clusters of tweets. Our methods, although quite simple, obtained promising results in the RepLab evaluations.JRC.G.2 - Global security and crisis managemen

JRC Publications Repository

Sentiment analysis meets social media – Challenges and solutions of the field in view of the current information sharing context

Author: JACQUET Guillaume
BALAHUR-DOBRESCU Alexandra
Publication venue
Publication date: 01/01/2015
Field of study

In this introductory article, we briefly define the key concepts in Sentiment Analysis and describe present challenges faced by research in the task. Subsequently, we introduce each of the papers in this volume – chosen from an open call for papers and extended versions of the best papers presented at the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2013)- and we describe their contribution to the advancement of the current research in Sentiment Analysis. Finally, we conclude on the issues that have been tackled and those that remain open and reflect on the possible future developments of the field.JRC.E.1 - Disaster Risk Managemen

JRC Publications Repository

Crossref