Search CORE

1,721,156 research outputs found

SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin

Author: van Deemter C.J.
Lin Chenghua
Natural Language Processing
Sub Natural Language Processing
Chen G.
Publication venue
Publication date: 05/11/2018
Field of study

We introduce SimpleNLG-ZH, a realisation engine for Mandarin that follows the software design paradigm of SimpleNLG. We explain the core grammar (morphology and syntax) and the lexicon of SimpleNLG-ZH, which is very different from English and other languages for which SimpleNLG engines have been built. The system was evaluated by regenerating expressions from a body of test sentences and a corpus of human-authored expressions. Human evaluation was conducted to estimate the quality of regenerated sentences

Utrecht University Repository

Modelling Pro-drop with the Rational Speech Acts Model

Author: van Deemter C.J.
Lin Chenghua
Natural Language Processing
Sub Natural Language Processing
Chen G.
Publication venue
Publication date: 05/11/2018
Field of study

We extend the classic Referring Expressions Generation task by considering zero pronouns in pro-drop languages such as Chinese, modelling their use by means of the Bayesian Rational Speech Acts model. By assuming that highly salient referents are most likely to be referred to by zero pronouns (i.e., pro-drop is more likely for salient referents than the less salient ones), the model offers an attractive explanation of a phenomenon not previously addressed probabilistically

Utrecht University Repository

Evaluating Sentence Representations for Biomedical Text: Methods and Experimental Results

Author: Tawfik N.
Spruit M.
Natural Language Processing
Sub Natural Language Processing
Publication venue
Publication date: 2020
Field of study

Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub

Utrecht University Repository

M-RAM: a Mobile Risk Assessment Method for Enterprise Mobile Security

Author: Janssen Joey
Spruit Marco
Natural Language Processing
Sub Organization and Information
Sub Natural Language Processing
Publication venue
Publication date: 01/01/2019
Field of study

Mobile solutions seem to outrun the control and governance within enterprise organizations. The acceptance of smartphones and tablets in business has gone at such a high pace that organizations are no longer able to oversee the risks of their mobile usage. Traditional risk assessment methods do not consider usage of mobile devices— mobility—despite the fact that enterprise organizations struggle with managing mobile risks. We aim to fill this gap by introducing a Mobile Risk Assessment Method (M-RAM). The method is based on an evaluation of industry standard risk methods and 22 interviews with mobile security experts. Three components compose the method: (1) a risk assessment process that is customized for mobility, (2) involved entities that oppose risks, and (3) attention areas that can contain vulnerabilities as well as controls. Moreover, the study provides a practical work program to conduct the M-RAM and validates the approach by conducting a case study

Utrecht University Repository

How We Do Things With Words: Analyzing Text as Social and Cultural Data

Author: Nguyen Dong
Liakata Maria
Natural Language Processing
Dedeo Simon
Eisenstein Jacob
Sub Natural Language Processing
Mimno David
Tromble Rebekah
Winters Jane
Publication venue
Publication date: 25/08/2020
Field of study

In this article we describe our experiences with computational text analysis involving rich social and cultural concepts. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of key questions that can guide work in this area. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that resonate for many. This leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis involving social and cultural concepts, and the more we bridge these divides, the more fruitful we believe our work will be

Utrecht University Repository

Listener's Social Identity Matters in Personalised Response Generation

Author: Davis Brian
Kelleher John
Du Yupei
Chen Guanyi
Sripada Yaji
Natural Language Processing
Sub Natural Language Processing
Graham Yvette
Zheng Yinhe
Publication venue
Publication date: 01/12/2020
Field of study

Personalised response generation enables generating human-like responses by means of assigning the generator a social identity. However, pragmatics theory suggests that human beings adjust the way of speaking based on not only who they are but also whom they are talking to. In other words, when modelling personalised dialogues, it might be favourable if we also take the listener's social identity into consideration. To validate this idea, we use gender as a typical example of a social variable to investigate how the listener's identity influences the language used in Chinese dialogues on social media. Also, we build personalised generators. The experiment results demonstrate that the listener's identity indeed matters in the language use of responses and that the response generator can capture such differences in language use. More interestingly, by additionally modelling the listener's identity, the personalised response generator performs better in its own identity

Utrecht University Repository

tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection

Author: Jurafsky Dan
Schluter Natalie
Nguyen Dong
Nicole Peinelt
Liakata Maria
Sub Natural Language Processing
Chai Joyce
Dong Nguyen
Tetreault Joel
Maria Liakata
Peinelt Nicole
Publication venue
Publication date: 2020
Field of study

Semantic similarity detection is a fundamental task in natural language understanding. Adding topic information has been useful for previous feature-engineered semantic similarity models as well as neural models for other tasks. There is currently no standard way of combining topics with pretrained contextual representations such as BERT. We propose a novel topic-informed BERT-based architecture for pairwise semantic similarity detection and show that our model improves performance over strong neural baselines across a variety of English language datasets. We find that the addition of topics to BERT helps particularly with resolving domain-specific cases

Crossref

Utrecht University Repository

Fuzzy-Based Language Grounding of Geographical References : From Writers to Readers

Author: Alonso Jose M.
Van Deemter Kees
Gatt Albert
Natural Language Processing
Reiter Ehud
Sub Natural Language Processing
Ramos Alejandro
Publication venue
Publication date: 01/01/2019
Field of study

Jose M. Alonso is Ramon y Cajal Researcher (RYC-2016-19802). This research was also funded by the Spanish Ministry of Science, Innovation and Universities (grants RTI2018-099646-BI00, TIN2017-84796-C2-1-R and TIN2017-90773-REDT) and the Galician Ministry of Education, University and Professional Training (grants ED431F2018/02, ED431C 2018/29 and “accreditation 2016-2019, ED431G/08”). All grants were co-funded by the European Regional Development Fund (ERDF/FEDER program).Peer reviewe

Aberdeen University Research

Utrecht University Repository

Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings

Author: Scott Hale
Nguyen D
Barbara McGillivray
Shoemark P
Natural Language Processing
Philippa Shoemark
McGillivray Barbara
Ferdousi Liza F
Farhana Ferdousi Liza
Liza Farhana Ferdousi
Hale Scott
Nguyen Dong
McGillivray B
Shoemark Philippa
Hale Scott A.
Sub Natural Language Processing
Dong Nguyen
Publication venue
Publication date: 01/01/2019
Field of study

Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years

University of Essex Research Repository

Crossref

Oxford University Research Archive

Apollo (Cambridge)

University of East Anglia digital repository

Utrecht University Repository

Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

Author: Š
Frank A
Natural Language Processing
Plank B
Kuyu M
Elena-Apostol S
Gatt A
Erdem E
Lloret E
Turuta O
Berend G
Calixto I
Martinčić-Ipšic S
Erdem A
Babii A
Korvel G
Sub Natural Language Processing
Ciprian-Truică O
Pârcălăbescu L
rih B
Yagcioglu S
Publication venue
Publication date: 01/01/2022
Field of study

Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions

Utrecht University Repository