Search CORE

1,721,258 research outputs found

Thematic analysis on online education issues during COVID-19

Author: Misuraca Michelangelo
Spano Maria
Basile Valerio
Publication venue
Publication date: 01/01/2022
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II

Long-term social media data collection at the University of Turin

Author: Lai Mirko
Sanguinetti Manuela
Basile Valerio
Publication venue
Publication date: 01/01/2019
Field of study

We report on the collection of social media messages - from Twitter in particular - in the Italian language that is continuously going on since 2012 at the University of Turin. A number of smaller datasets have been extracted from the main collection and enriched with different kinds of annotations for linguistic purposes. Moreover, a few extra datasets have been collected independently and are now in the process of being merged with the main collection. We aim at making the resource available to the community to the best of our possibility, in accordance with the Terms of Service provided by the platforms where data have been gathered from

Archivio istituzionale della ricerca - Università di Cagliari

Resources and benchmark corpora for hate speech detection: a systematic review

Author: Bosco Cristina
Patti Viviana
Sanguinetti Manuela
Poletto Fabio
Basile Valerio
Publication venue
Publication date: 01/01/2021
Field of study

Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement

Archivio istituzionale della ricerca - Università di Cagliari

Mapping natural language labels to structured web resources

Author: Nozza Debora
Basile Valerio
Cabrio Elena
Gandon Fabien
Publication venue
Publication date: 01/01/2018
Field of study

Archivio istituzionale della Ricerca - Bocconi

Lessons Learned from EVALITA 2020 and Thirteen Years of Evaluation of Italian Language Technology

Author: Passaro Lucia C
Di Maro Maria
Basile Valerio
Croce Danilo
Publication venue
Publication date: 01/01/2020
Field of study

This paper provides a summary of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA2020) which was held online on December 17th, due to the 2020 COVID-19 pandemic. The 2020 edition of Evalita included 14 different tasks belonging to five research areas, namely: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, (v) Time and Diachrony. This paper provides a description of the tasks and the key findings from the analysis of participant outcomes. Moreover, it provides a detailed analysis of the participants and task organizers which demonstrates the growing interest with respect to this campaign. Finally, a detailed analysis of the evaluation of tasks across the past seven editions is provided; this allows to assess how the research carried out by the Italian community dealing with Computational Linguistics has evolved in terms of popular tasks and paradigms during the last 13 years

Archivio della ricerca - Università degli studi di Napoli Federico II

Long-term Social Media Data Collection at the University of Turin

Author: Lai Mirko
Basile Valerio
Valerio Basile
Mirko Lai
Sanguinetti Manuela
Manuela Sanguinetti
Publication venue
Publication date: 01/01/2018
Field of study

We report on the collection of social media messages — from Twitter in particular — in the Italian language that is continuously going on since 2012 at the University of Turin. A number of smaller datasets have been extracted from the main collection and enriched with different kinds of annotations for linguistic purposes. Moreover, a few extra datasets have been collected independently and are now in the process of being merged with the main collection. We aim at making the resource available to the community to the best of our possibility, in accordance with the Terms of Service provided by the platforms where data have been gathered from.In questo articolo descriviamo il lavoro di raccolta di messaggi — da Twitter in particolar modo—in lingua italiana che va avanti in maniera continuativa dal 2012 presso l’Università di Torino. Diversi dataset sono stati estratti dalla raccolta principale ed arricchiti con differenti tipi di annotazione per scopi linguistici. Inoltre, dataset ulteriori sono stati raccolti indipendentemente, e fanno ora parte della raccolta principale. Il nostro scopo è rendere questa risorsa disponibile alla comunit` a in maniera pi`u completa possibile, considerati i termini d’uso imposti dalle piattaforme da cui i dati sono stati estratti

Crossref

OpenEdition

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Institutional Research Information System University of Turin

Lessons Learned from EVALITA 2020 and Thirteen Years of Evaluation of Italian Language Technology

Author: Basile Valerio
Lucia Passaro
Lucia C. Passaro
Valerio Basile
Danilo Croce
Maria Di Maro
Publication venue
Publication date: 01/01/2020
Field of study

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Institutional Research Information System University of Turin

Leveraging Hate Speech Detection to Investigate Immigration-related Phenomena in Italy

Author: Viviana Patti
Lai Mirko
Basile Valerio
Patti Viviana
Valerio Basile
Mirko Lai
Florio Komal
Komal Florio
Publication venue
Publication date: 01/01/2019
Field of study

The presence and integration of immigrants is one of the most controversial issues in our society, and given current worldwide political instabilities, it will likely become ever more prominent in the cultural and political debate. Social media play an increasingly important role in how citizens debate opinions and react to local and global events. However, several studies point out the danger of social media as a breeding ground for online hate speech (or cyberhate). We propose a novel approach to the exploratory analysis of social phenomena based on the integration of automatic detection of cyberhate against immigrants with offline indicators. We gathered data from the Italian Twittersphere and from the main supplier of official statistical data in Italy (ISTAT). We developed a supervised classification model for hate speech detection, trained on a corpus of Italian tweets manually annotated for hate speech against immigrants, and use it to automatically annotate a large sample of geo-tagged tweets over a span of six years. We crossed this data with the ISTAT data, exploring three macro-indicators related to employment, education and crime. We found correlations suggesting an interplay between economical and cultural factors and the expression of hate onlin

Crossref

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Institutional Research Information System University of Turin

An Italian lexical resource for incivility detection in online discourses

Author: Lara Fontanella
Basile Valerio
Valerio Basile
Fontanella Lara
Stefano Anzani
Anzani Stefano
Alice Tontodimamma
Tontodimamma Alice
Publication venue
Publication date: 11/08/2022
Field of study

AbstractThe exponential growth of social media has brought an increasing propagation of online hostile communication and vitriolic discourses, and social media have become a fertile ground for heated discussions that frequently result in the use of insulting and offensive language. Lexical resources containing specific negative words have been widely employed to detect uncivil communication. This paper describes the development and implementation of an innovative resource, namely the Revised HurtLex Lexicon, in which every headword is annotated with an offensiveness level score. The starting point is HurtLex, a multilingual lexicon of hate words. Concentrating on the Italian entries, we revised the terms in HurtLex and derived an offensive score for each lexical item by applying an Item Response Theory model to the ratings provided by a large number of annotators. This resource can be used as part of a lexicon-based approach to track offensive and hateful content. Our work comprises an evaluation of the Revised HurtLex lexicon.</jats:p

Crossref

ARUd’A (Università “G. d’Annunzio CHIETI -PESCARA)

Empirical analysis of foundational distinctions in linked open data

Author: Asprino Luigi
Luigi Asprino
Basile Valerio
Valentina Presutti
Presutti Valentina
Valerio Basile
Ciancarini Paolo
Paolo Ciancarini
Publication venue
Publication date: 01/01/2018
Field of study

The Web and its Semantic extension (i.e. Linked Open Data) contain open global-scale knowledge and make it available to potentially intelligent machines that want to benefit from it. Nevertheless, most of Linked Open Data lack ontological distinctions and have sparse axiomatisation. For example, distinctions such as whether an entity is inherently a class or an individual, or whether it is a physical object or not, are hardly expressed in the data, although they have been largely studied and formalised by foundational ontologies (e.g. DOLCE, SUMO). These distinctions belong to common sense too, which is relevant for many artificial intelligence tasks such as natural language understanding, scene recognition, and the like. There is a gap between foundational ontologies, that often formalise or are inspired by pre-existing philosophical theories and are developed with a top-down approach, and Linked Open Data that mostly derive from existing databases or crowd-based effort (e.g. DBpedia, Wikidata). We investigate whether machines can learn foundational distinctions over Linked Open Data entities, and if they match common sense. We want to answer questions such as "does the DBpedia entity for dog refer to a class or to an instance?". We report on a set of experiments based on machine learning and crowdsourcing that show promising results

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin