Search CORE

1,721,168 research outputs found

Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0

Author: McCrae John ; https://orcid.org/
Ehrmann Maud
Roberto Navigli Roberto
Vannella Daniele
Cimiano Philipp ; https://orcid.org/
Cecconi Francesco
Publication venue
Publication date: 01/01/2014
Field of study

Ehrmann M, Cecconi F, Vannella D, McCrae J, Cimiano P, Roberto Navigli R. Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0. Presented at the LREC 2014

Publications at Bielefeld University

ESC: Redesigning WSD with Extractive Sense Comprehension

Author: Tommaso Pasini
Roberto Navigli
Edoardo Barba
Publication venue
Publication date: 01/01/2021
Field of study

Word Sense Disambiguation (WSD) is a historical NLP task aimed at linking words in contexts to discrete sense inventories and it is usually cast as a multi-label classification task. Recently, several neural approaches have employed sense definitions to better represent word meanings. Yet, these approaches do not observe the input sentence and the sense definition candidates all at once, thus potentially reducing the model performance and generalization power. We cope with this issue by reframing WSD as a span extraction problem — which we called Extractive Sense Comprehension (ESC) — and propose ESCHER, a transformer-based neural architecture for this new formulation. By means of an extensive array of experiments, we show that ESC unleashes the full potential of our model, leading it to outdo all of its competitors and to set a new state of the art on the English WSD task. In the few-shot scenario, ESCHER proves to exploit training data efficiently, attaining the same performance as its closest competitor while relying on almost three times fewer annotations. Furthermore, ESCHER can nimbly combine data annotated with senses from different lexical resources, achieving performances that were previously out of everyone’s reach. The model along with data is available at https://github.com/SapienzaNLP/esc

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Huge automatically extracted training sets for multilingual Word Sense Disambiguation

Author: Tommaso Pasini
Roberto Navigli
ELIA FRANCESCO MARIA
Publication venue
Publication date: 01/01/2018
Field of study

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic. org

Archivio della ricerca- Università di Roma La Sapienza

Automated short answer grading: A simple solution for a difficult task

Author: Vittorini P.
Menini S.
de Gasperis G.
Tonelli S.
Publication venue
Publication date: 01/01/2019
Field of study

The task of short answer grading is aimed at assessing the outcome of an exam by automatically analysing students’ answers in natural language and deciding whether they should pass or fail the exam. In this paper, we tackle this task training an SVM classifier on real data taken from a University statistics exam, showing that simple concatenated sentence embeddings used as features yield results around 0.90 F1, and that adding more complex distance-based features lead only to a slight improvement. We also release the dataset, that to our knowledge is the first freely available dataset of this kind in Italian.

IRIS Università degli Studi dell'Aquila

A Comparative Study of Models for Answer Sentence Selection

Author: Silvia Severini
Federico Rossetto
Alessio Gravina
Publication venue
Publication date: 01/01/2019
Field of study

Answer Sentence Selection is one of the steps typically involved in Question Answering. Question Answering is considered a hard task for natural language processing systems, since full solutions would require both natural language understanding and inference abilities. In this paper, we explore how the state of the art in answer selection has improved recently, comparing two of the best proposed models for tackling the problem: the Cross-attentive Convolutional Network and the BERT model. The experiments are carried out on two datasets, WikiQA and SelQA, both created for and used in open-domain question answering challenges. We also report on cross domain experiments with the two datasets

Archivio della Ricerca - Università di Pisa

Is “manovra” Really “del popolo”? Linguistic Insights into Twitter Reactions to the Annual Italian Budget Law

Author: Claudia Roberta Combei
Publication venue
Publication date: 01/01/2019
Field of study

Relying on linguistic cues obtained by means of structural topic modeling as well as descriptive lexical analyses, this study contributes to the general understanding of the Twitter users’ response to the annual Italian budget law approved at the end of December 2018. Some topics contained in the dataset of tweets are procedural or generic, but besides those, it often emerges that Twitter users expressed their concern with respect to the provisions of this law. Supportive attitudes seem to be less frequent. This paper also advocates that findings from inductive studies on Twitter data should be interpreted with caution, since the nature of tweets might not be adequate for drawing far-reaching generalizations

ART

Enhancing a Text Summarization System with ELMo

Author: Mastronardo C.
Tamburini F.
Publication venue
Publication date: 01/01/2019
Field of study

Text summarization has gained a considerable amount of research interest due to deep learning based techniques. We lever- age recent results in transfer learning for Natural Language Processing (NLP) using pre-trained deep contextualized word embeddings in a sequence-to-sequence architecture based on pointer-generator networks. We evaluate our approach on the two largest summarization datasets: CNN/Daily Mail and the recent Newsroom dataset. We show how using pre-trained contextualized embeddings on Newsroom improves significantly the state-of-the-art ROUGE-1 measure and obtains comparable scores on the other ROUGE values

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Asymmetries in extraction from nominal copular sentences: A challenging case study for NLP tools

Author: Lorusso P.
Cristiano Chesi
Paolo Lorusso
Moro A.
Andrea Moro
Matteo Paolo Greco
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we discuss two types of nominal copular sentences (Canonical and Inverse, Moro 1997) and we demonstrate how the peculiarities of these two configurations are hardly considered by standard NLP tools that are currently publicly available. Here we show that example-based MT tools (e.g. Google Translate) as well as other NLP tools (UDpipe, LinguA, Stanford Parser, and Google Cloud AI API) fail in capturing the critical distinctions between the two structures in the end producing both wrong analyses and, possibly as a consequence of a non-coherent (or missing) structural analysis, incorrect translations in the case of MT tools. To support the proposed analysis, we present also an empirical study showing that native speakers are indeed sensitive to the critical distinctions. This poses a sharp challenge for NLP tools that aim at being cognitively plausible or at least descriptively adequate (Chowdhury & Zamparelli 2018)

Archivio istituzionale della ricerca - Università degli Studi di Udine

Florence Research

Text Frame Detector: Slot Filling Based On Domain Knowledge Bases

Author: Lucia Passaro
Martina Miliani
Alessandro Lenci
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we present a systemcalledText Frame Detector(TFD) whichaims at populating a frame-based ontologyin a graph-based structure. Our systemorganizes textual information into frames,according to a predefined set of semanti-cally informed patterns linking pre-codedinformation such as named entities, sim-ple and complex terms. Given the semi-automatic expansion of such informationwith word embeddings, the system can beeasily adapted to new domains

Archivio della Ricerca - Università di Pisa

Gender Detection and Stylistic Differences and Similarities between Males and Females in a Dream Tales Blog

Author: Johanna Monti
Antonio Pascucci
MANNA RAFFAELE
Pascucci Antonio
Monti Johanna
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we present the results of a gender detection experiment carried out on a corpus we built downloading dream tales from a blog. We also highlight stylistic differences and similarities concerning lexical choices between men and women. In order to carry the experiment we built a feed-forward neural network with traditional sparse n-hot encoding using the Keras open-source librar

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Università degli Studi di Napoli L'Orientale: CINECA IRIS