Search CORE

1,721,382 research outputs found

Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish

Author: Plank Barbara
Publication venue
Publication date: 01/01/2019
Field of study

ADA University of Tartu

Lexical Resources for Low-Resource PoS Tagging in Neural Times

Author: Plank Barbara
Klerke Sigrid
Publication venue
Publication date: 01/01/2019
Field of study

ADA University of Tartu

Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

Author: Ramponi Alan
Plank Barbara
Lombardo Rosario
Publication venue
Publication date: 01/01/2020
Field of study

Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985

Catalogo dei prodotti della ricerca Università degli Studi di Verona

The Lacunae of Danish Natural Language Processing

Author: Plank Barbara
Schluter Natalie
Kirkedal Andreas
Derczynski Leon
Publication venue
Publication date: 01/01/2019
Field of study

ADA University of Tartu

When POS data sets don’t add up: Combatting sample bias

Author: Plank Barbara
Hovy Dirk
S\ogaard Anders
Søgaard Anders
Publication venue
Publication date: 01/01/2014
Field of study

No abstract availabl

Archivio istituzionale della Ricerca - Bocconi

Copenhagen University Research Information System

Personality Traits on Twitter —or— How to Get 1500 Personality Tests in a Week

Author: Plank Barbara
Barbara Plank
Hovy Dirk
Dirk Hovy
Publication venue
Publication date: 01/01/2015
Field of study

Psychology research suggests that certain personality traits correlate with linguistic features. This correlation can be effec-tively modeled with statistical natural lan-guage processing techniques. Prediction accuracy of these models should improve with larger data samples and more fea-tures. Most existing work on personality prediction from text, however, focuses on small samples and closed-vocabulary in-vestigations. Both factors limit general-ity and statistical power of the results. In this paper, we explore the use of social media as a resource for large-scale, open-vocabulary personality detection. We ana-lyze which features are predictive of which personality traits, and present a novel cor-pus of 1.2M tweets with personality and gender annotation. Our results suggest that social media can be a valuable source for certain personality type predictions.

CiteSeerX

Crossref

Archivio istituzionale della Ricerca - Bocconi

Copenhagen University Research Information System

SenTube: A Corpus for Sentiment Analysis on YouTube Social Media

Author: Plank Barbara
Rotondi Agata
Severyn Aliaksei
Uryupina Olga
Moschitti Alessandro
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details

ARCA (Univ. Ca'Foscari)

Copenhagen University Research Information System

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Biomedical Event Extraction as Sequence Labeling

Author: Ramponi Alan
Plank Barbara
Lombardo Rosario
van der Goot Rob
Publication venue
Publication date: 01/01/2020
Field of study

We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model. BeeSL recasts the task as sequence labeling, taking advantage of a multi-label aware encoding strategy and jointly modeling the intermediate tasks via multi-task learning. BeeSL is fast, accurate, end-to-end, and unlike current methods does not require any external knowledge base or preprocessing tools. BeeSL outperforms the current best system (Li et al., 2019) on the Genia 2011 benchmark by 1.57% absolute F1 score reaching 60.22% F1, establishing a new state of the art for the task. Importantly, we also provide first results on biomedical event extraction without gold entity information. Empirical results show that BeeSL’s speed and accuracy makes it a viable approach for large-scale real-world scenarios

Catalogo dei prodotti della ricerca Università degli Studi di Verona

Linguistically debatable or just plain wrong?

Author: Plank Barbara
Barbara Plank
Hovy Dirk
S\ogaard Anders
Søgaard Anders
Dirk Hovy
Anders Søgaard
Publication venue
Publication date: 01/01/2014
Field of study

No abstract availabl

Crossref

Archivio istituzionale della Ricerca - Bocconi

Copenhagen University Research Information System

Experiments with crowdsourced re-annotation of a POS tagging data set

Author: Plank Barbara
Barbara Plank
Hovy Dirk
S\ogaard Anders
Søgaard Anders
Dirk Hovy
Anders Søgaard
Publication venue
Publication date: 01/01/2014
Field of study

No abstract availabl

Crossref

Archivio istituzionale della Ricerca - Bocconi

Copenhagen University Research Information System