1,721,296 research outputs found

    Crowdsourcing Temporal Relations in Italian and English

    No full text
    This paper reports on two crowdsourcing experiments on Temporal Relation Annotation in Italian and English. The aim of these experiments is three-fold: first, to evaluate average Italian and English native speakers on their ability to identify and classify a temporal relation between two verbal events; second, to assess the feasibility of crowdsourcing for this kind of complex semantic task; third to perform a preliminary analysis of the role of syntax within such task. Two categories of temporal relations were investigated: relations between the main event and its subordinated event (e.g. So che hai visto Giovanni / I know you’ve seen John) and relations between two main events (e.g. Giovanni bussò ed entrò / John knocked and got in). Fifty aligned parallel sentences in the two languages from the MultiSemCor corpus were extracted. In each sentence, the source and the target verbs of the relations were highlighted and contributors were asked to select the temporal relation from 7 values (AFTER, BEFORE, INCLUDES, IS INCLUDED, SIMULTANEOUS, NO RELATION, and DON’T KNOW) inspired by the TimeML Annotation Guidelines. For each sentence, 5 judgments were collected. The results of the annotator agreement is 0.41 for Italian and 0.32 for English. Analysis of the data has shown that annotating temporal relations is not a trivial task and that dependency relations between events have a major role in facilitating the annotation. Future work aims at conducting new experiments with an additional parameter, namely factivity, and with texts in a different domain, i.e. History

    Specificity ratings for Italian data

    Full text link
    Abstraction enables us to categorize experience, learn new information, and form judgments. Language arguably plays a crucial role in abstraction, providing us with words that vary in specificity (e.g., highly generic: tool vs. highly specific: muffler). Yet, human-generated ratings of word specificity are virtually absent. We hereby present a dataset of specificity ratings collected from Italian native speakers on a set of around 1K Italian words, using the Best-Worst Scaling method. Through a series of correlation studies, we show that human-generated specificity ratings have low correlation coefficients with specificity metrics extracted automatically from WordNet, suggesting that WordNet does not reflect the hierarchical relations of category inclusion present in the speakers’ minds. Moreover, our ratings show low correlations with concreteness ratings, suggesting that the variables Specificity and Concreteness capture two separate aspects involved in abstraction and that specificity may need to be controlled for when investigating conceptual concreteness. Finally, through a series of regression studies we show that specificity explains a unique amount of variance in decision latencies (lexical decision task), suggesting that this variable has theoretical value. The results are discussed in relation to the concept and investigation of abstraction.</p

    Crowdsourcing Temporal Relations in Italian and English

    No full text
    This paper reports on two crowdsourcing experiments on Temporal Relation Annotation in Italian and English. The aim of these experiments is three-fold: first, to evaluate average Italian and English native speakers on their ability to identify and classify a temporal relation between two verbal events; second, to assess the feasibility of crowdsourcing for this kind of complex semantic task; third to perform a preliminary analysis of the role of syntax within such task. Two categories of temporal relations were investigated: relations between the main event and its subordinated event (e.g. So che hai visto Giovanni / I know you’ve seen John) and relations between two main events (e.g. Giovanni bussò ed entrò / John knocked and got in). Fifty aligned parallel sentences in the two languages from the MultiSemCor corpus were extracted. In each sentence, the source and the target verbs of the relations were highlighted and contributors were asked to select the temporal relation from 7 values (AFTER, BEFORE, INCLUDES, IS INCLUDED, SIMULTANEOUS, NO RELATION, and DON’T KNOW) inspired by the TimeML Annotation Guidelines. For each sentence, 5 judgments were collected. The results of the annotator agreement is 0.41 for Italian and 0.32 for English. Analysis of the data has shown that annotating temporal relations is not a trivial task and that dependency relations between events have a major role in facilitating the annotation. Future work aims at conducting new experiments with an additional parameter, namely factivity, and with texts in a different domain, i.e. History

    A Narratology-Based Framework for Storyline Extraction

    Full text link
    Stories are a pervasive phenomenon of human life. They also represent a cognitive tool to understand and make sense of the world and of its happenings. In this contribution we describe a narratology-based framework for modeling stories as a combination of different data structures and to automatically extract them from news articles. We introduce a distinction among three data structures (timelines, causelines, and storylines) that capture different narratological dimensions, respectively chronological ordering, causal connections, and plot structure. We developed the Circumstantial Event Ontology (CEO) for modeling (implicit) circumstantial relations as well as explicit causal relations and create two benchmark corpora: ECB+/CEO, for causelines, and the Event Storyline Corpus (ESC), for storylines. To test our framework and the difficulty in automatically extract causelines and storylines, we develop a series of reasonable baseline system

    Identifying communicative functions in discourse with content types

    Full text link
    Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis

    Temporal Information Annotation: Crowd vs. Experts

    No full text
    This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, ie, English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing

    Content Type Dataset - v1.5

    No full text
    This repository contains: - the Content Type Dataset Version 1.5 (in the folder "Datasets"); - the latest version of the guidelines for annotating Content Types; - the data statement related to CTD V1.5; - a set of spreadsheets containing metadata about the documents included in the dataset, e.g. year of publication, author's name, author's nationality, author's gender (in the folder "Documents_Metadata"); - the data to replicate a set of experiments for the identification of Content Types (in the folder "Datasets"); - the best model for the identification of Content Types obtained adopting the BiLSTM-CNN-CRF with ELMo-Representations for Sequence Tagging implementation by Nils Reimers and Iryna Gurevych (in the folder "Best_Model"); - the data used to calculate the Inter-Annotator Agreement (in the folder "IAA"): the script used for calculating Cohen's k is available here: https://github.com/johnnymoretti/CAT_R_Kappa_Cohe

    Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports

    No full text
    Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic research in this area and conduct a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. We then train regression models that predict the salience of GBV participants with respect to different dimensions of perceived responsibility. Our best model (fine-tuned BERT) shows solid overall performance, with large differences between dimensions and participants: salient _focus_ is more predictable than salient _blame_, and perpetrators' salience is more predictable than victims' salience. Experiments with ridge regression models using different representations show that features based on linguistic theory similarly to word-based features. Overall, we show that different linguistic choices do trigger different perceptions of responsibility, and that such perceptions can be modelled automatically. This work can be a core instrument to raise awareness of the consequences of different perspectivizations in the general public and in news producers alike

    Specificity ratings for English data

    No full text
    A dataset of specificity ratings for English words is hereby presented, analyzed and discussed in relation with other collections of speaker-generated ratings, including concreteness. Both, specificity and concreteness are analyzed in their ability to explain decision latencies in lexical and semantic tasks, showing important individual contributions. Specificity ratings are collected through best–worst scaling method on the words included in the ANEW dataset (Bradley and Lang in Affective norms for English words (ANEW): instruction manual and affective ratings (Tech. Rep.). Technical report C-1, the center for research in psychophysiology, 1999), chosen for its compatibility with many other collections of rating resources, and for its comparability with Italian specificity data (Bolognesi and Caselli in Behav Res Methods 55(7):3531–3548, 2023), allowing for cross-linguistic comparisons. Results suggest that specificity plays an important role in word processing and the importance of taking specificity into consideration when investigating concreteness effects

    FBK-TR: Applying SVM with multiple linguistic features for Cross-Level Semantic Similarity

    No full text
    Recently, the task of measuring seman- tic similarity between given texts has drawn much attention from the Natural Language Processing community. Espe- cially, the task becomes more interesting when it comes to measuring the seman- tic similarity between different-sized texts, e.g paragraph-sentence, sentence-phrase, phrase-word, etc. In this paper, we, the FBK-TR team, describe our system par- ticipating in Task 3 "Cross-Level Seman- tic Similarity", at SemEval 2014. We also report the results obtained by our system, compared to the baseline and other partic- ipating systems in this task
    corecore