1,721,306 research outputs found

    Towards a Multilingual Financial Narrative Processing System

    Full text link
    Large scale financial narrative processing for UK annual reports has only become possible in the last few years with our prior work on automatically understanding and extracting the structure of unstructured PDF glossy reports. This has levelled the playing field somewhat relative to US research where annual reports (10-K Forms) have a rigid structure imposed on them by legislation and are submitted in plain text format. The structure extraction is just the first step in a pipeline of analyses to examine disclosure quality and change over time relative to financial results. In this paper, we describe and evaluate the use of similar Information Extraction and Natural Language Processing methods for extraction and analysis of annual financial reports in a second language (Portuguese) in order to evaluate the applicability of our techniques in another national context (Portugal). Extraction accuracy varies between languages with English exceeding 95%. To further examine the robustness of our techniques, we apply the extraction methods on a comprehensive sample of annual reports published by UK and Portuguese non-financial firms between 2003 and 2015

    Development of the multilingual semantic annotation system

    Full text link
    This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an existing English semantic annotation tool to cover a range of languages, namely Italian, Chinese and Brazilian Portuguese, by bootstrapping new semantic lexical resources via automatically translating existing English semantic lexicons into these languages. We used a set of bilingual dictionaries and word lists for this purpose. In our experiment, with minor manual improvement of the automatically generated semantic lexicons, the prototype tools based on the new lexicons achieved an average lexical coverage of 79.86% and an average annotation precision of 71.42% (if only precise annotations are considered) or 84.64% (if partially correct annotations are included) on the three languages. Our experiment demonstrates that it is feasible to rapidly develop prototype semantic annotation tools for new languages by automatically bootstrapping new semantic lexicons based on existing ones

    Using a semantic annotation tool for the analysis of metaphor in discourse.

    Full text link
    This paper describes the application of semantic annotation software for analysing metaphor in corpora of different genres. In particular, we outline three projects analysing RELIGION and POLITICS metaphors in corporate mission statements, the WAR metaphor in business magazines, and MACHINE and LIVING ORGANISM metaphors in a novel and in a second collection of business magazine articles. This research was guided by the hypotheses that a) semantic tags allocated by the software can correspond to source domains of metaphoric expressions, and b) that more conventional metaphors feature a source domain tag as first choice in the type’s semantic profile. The tagger was adapted to better serve the needs of metaphor research and automate to a greater extent the extraction of first choice and secondary semantic domains. Two of the three studies represent re-analyses of previous manual and/or lexical corpus-based investigations, and findings indicate that semantic annotation can yield more comprehensive results

    Tagging the Bard : Evaluating the Accuracy of a Modern POS Tagger on Early Modern English Corpora

    Full text link
    In this paper we focus on automatic part-of-speech (POS) annotation, in the context of historical English texts. Techniques that were originally developed for modern English have been applied to numerous other languages over recent years. Despite this diversification, it is still almost invariably the case that the texts being analysed are from contemporary rather than historical sources. Although there is some recognition among historical linguists of the advantages of annotation for the retrieval of lexical, grammatical and other linguistic phenomena, the implementation of such forms of annotation by automatic methods is problematic. For example, changes in grammar over time will lead to a mismatch between probabilistic language models derived from, say, Present-day English and Middle English. Similarly, variability and changes in spelling can cause problems for POS taggers with fixed lexicons and rulebases. To determine the extent of the problem, and develop possible solutions, we decided to evaluate the accuracy of existing POS taggers, trained on modern English, when they are applied to Early Modern English (EModE) datasets. We focus here on the CLAWS POS tagger, a hybrid rule-based and statistical tool for English, and use as experimental data the Shakespeare First Folio and the Lampeter Corpus. First, using a manually post-edited test set, we evaluate the accuracy of CLAWS when no modifications are made either to its grammatical model or to its lexicon. We then compare this output with CLAWS' performance when using a pre-processor that detects spelling variants and matches them to modern equivalents. This experiment highlights (i) the extent to which the handling of orthographic variants is sufficient for the tagging accuracy of EModE data to approximate to the levels attained on modernday text(s), and (ii) in turn, whether revisions to the lexical resources and language models of POS taggers need to be made

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Geoparsing, GIS and textual analysis: current developments in spatial humanities research

    Full text link
    This document is the Accepted Manuscript version of a published work that appeared in final form in International Journal of Humanities and Arts Computing. To access the final edited and published work see http://www.euppublishing.com/doi/10.3366/ijhac.2015.0135.The spatial humanities constitute a rapidly developing research field that has the potential to create a step-change in the ways in which the humanities deal with geography and geographical information. As yet, however, research in the spatial humanities is only just beginning to deliver the applied contributions to knowledge that will prove its significance. Demonstrating the potential of innovations in technical fields is, almost always, a lengthy process, as it takes time to create the required datasets and to design and implement appropriate techniques for engaging with the information those datasets contain. Beyond this, there is the need to define appropriate research questions and to set parameters for interpreting findings, both of which can involve prolonged discussion and debate. The spatial humanities are still in early phases of this process. Accordingly, the purpose of this special issue is to showcase a set of exemplary studies and research projects that not only demonstrate the field’s potential to contribute to knowledge across a range of humanities disciplines, but also to suggest pathways for future research. Our ambition is both to demonstrate how the application of exploratory techniques in the spatial humanities offers new insights about the geographies embedded in a diverse range of texts (including letters, works of literature, and official reports) and, at the same time, to encourage other scholars to integrate these techniques in their research

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore