1,721,382 research outputs found

    Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

    Full text link
    Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985

    Personality Traits on Twitter —or— How to Get 1500 Personality Tests in a Week

    No full text
    Psychology research suggests that certain personality traits correlate with linguistic features. This correlation can be effec-tively modeled with statistical natural lan-guage processing techniques. Prediction accuracy of these models should improve with larger data samples and more fea-tures. Most existing work on personality prediction from text, however, focuses on small samples and closed-vocabulary in-vestigations. Both factors limit general-ity and statistical power of the results. In this paper, we explore the use of social media as a resource for large-scale, open-vocabulary personality detection. We ana-lyze which features are predictive of which personality traits, and present a novel cor-pus of 1.2M tweets with personality and gender annotation. Our results suggest that social media can be a valuable source for certain personality type predictions.

    SenTube: A Corpus for Sentiment Analysis on YouTube Social Media

    No full text
    In this paper we present SenTube -- a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details

    Biomedical Event Extraction as Sequence Labeling

    Full text link
    We introduce Biomedical Event Extraction as Sequence Labeling (BeeSL), a joint end-to-end neural information extraction model. BeeSL recasts the task as sequence labeling, taking advantage of a multi-label aware encoding strategy and jointly modeling the intermediate tasks via multi-task learning. BeeSL is fast, accurate, end-to-end, and unlike current methods does not require any external knowledge base or preprocessing tools. BeeSL outperforms the current best system (Li et al., 2019) on the Genia 2011 benchmark by 1.57% absolute F1 score reaching 60.22% F1, establishing a new state of the art for the task. Importantly, we also provide first results on biomedical event extraction without gold entity information. Empirical results show that BeeSL’s speed and accuracy makes it a viable approach for large-scale real-world scenarios
    corecore