1,721,156 research outputs found

    SimpleNLG-ZH: a Linguistic Realisation Engine for Mandarin

    No full text
    We introduce SimpleNLG-ZH, a realisation engine for Mandarin that follows the software design paradigm of SimpleNLG. We explain the core grammar (morphology and syntax) and the lexicon of SimpleNLG-ZH, which is very different from English and other languages for which SimpleNLG engines have been built. The system was evaluated by regenerating expressions from a body of test sentences and a corpus of human-authored expressions. Human evaluation was conducted to estimate the quality of regenerated sentences

    Modelling Pro-drop with the Rational Speech Acts Model

    No full text
    We extend the classic Referring Expressions Generation task by considering zero pronouns in pro-drop languages such as Chinese, modelling their use by means of the Bayesian Rational Speech Acts model. By assuming that highly salient referents are most likely to be referred to by zero pronouns (i.e., pro-drop is more likely for salient referents than the less salient ones), the model offers an attractive explanation of a phenomenon not previously addressed probabilistically

    Evaluating Sentence Representations for Biomedical Text: Methods and Experimental Results

    Full text link
    Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub

    M-RAM: a Mobile Risk Assessment Method for Enterprise Mobile Security

    No full text
    Mobile solutions seem to outrun the control and governance within enterprise organizations. The acceptance of smartphones and tablets in business has gone at such a high pace that organizations are no longer able to oversee the risks of their mobile usage. Traditional risk assessment methods do not consider usage of mobile devices— mobility—despite the fact that enterprise organizations struggle with managing mobile risks. We aim to fill this gap by introducing a Mobile Risk Assessment Method (M-RAM). The method is based on an evaluation of industry standard risk methods and 22 interviews with mobile security experts. Three components compose the method: (1) a risk assessment process that is customized for mobility, (2) involved entities that oppose risks, and (3) attention areas that can contain vulnerabilities as well as controls. Moreover, the study provides a practical work program to conduct the M-RAM and validates the approach by conducting a case study

    How We Do Things With Words: Analyzing Text as Social and Cultural Data

    Full text link
    In this article we describe our experiences with computational text analysis involving rich social and cultural concepts. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of key questions that can guide work in this area. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that resonate for many. This leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis involving social and cultural concepts, and the more we bridge these divides, the more fruitful we believe our work will be

    Listener's Social Identity Matters in Personalised Response Generation

    Full text link
    Personalised response generation enables generating human-like responses by means of assigning the generator a social identity. However, pragmatics theory suggests that human beings adjust the way of speaking based on not only who they are but also whom they are talking to. In other words, when modelling personalised dialogues, it might be favourable if we also take the listener's social identity into consideration. To validate this idea, we use gender as a typical example of a social variable to investigate how the listener's identity influences the language used in Chinese dialogues on social media. Also, we build personalised generators. The experiment results demonstrate that the listener's identity indeed matters in the language use of responses and that the response generator can capture such differences in language use. More interestingly, by additionally modelling the listener's identity, the personalised response generator performs better in its own identity

    tBERT: Topic Models and BERT Joining Forces for Semantic Similarity Detection

    Full text link
    Semantic similarity detection is a fundamental task in natural language understanding. Adding topic information has been useful for previous feature-engineered semantic similarity models as well as neural models for other tasks. There is currently no standard way of combining topics with pretrained contextual representations such as BERT. We propose a novel topic-informed BERT-based architecture for pairwise semantic similarity detection and show that our model improves performance over strong neural baselines across a variety of English language datasets. We find that the addition of topics to BERT helps particularly with resolving domain-specific cases

    Fuzzy-Based Language Grounding of Geographical References : From Writers to Readers

    Full text link
    Jose M. Alonso is Ramon y Cajal Researcher (RYC-2016-19802). This research was also funded by the Spanish Ministry of Science, Innovation and Universities (grants RTI2018-099646-BI00, TIN2017-84796-C2-1-R and TIN2017-90773-REDT) and the Galician Ministry of Education, University and Professional Training (grants ED431F2018/02, ED431C 2018/29 and “accreditation 2016-2019, ED431G/08”). All grants were co-funded by the European Regional Development Fund (ERDF/FEDER program).Peer reviewe

    Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings

    Full text link
    Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Full text link
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions
    corecore