1,721,102 research outputs found

    Proceedings of The 3rd Workshop on Multi-word Units in Machine Translation and Translation Technology (MUMTTT 2017)

    No full text
    This volume documents the proceedings of the 3rd Workshop on Multi-word Units in Machine Translation and Translation Technology (MUMTTT 2017), held on 4 November 2017 as part of the EUROPHRAS 2017 conference: "Computational and Corpus-based Approaches to Phraseology: Recent advances and interdisciplinary approaches" (London, 13-14 November 2015), jointly organised by the European Association for Phraseology (EUROPHRAS), the University of Wolverhampton (Research Institute of Information and Language Processing) and the Association for Computational Linguistics – Bulgaria. The workshop was held under the auspices of the European Society of Phraseology (EUROPHRAS), the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX), and SIGLEX's Multiword Expressions Section (SIGLEX-MWE). The workshop was co-chaired by Ruslan Mitkov (University of Wolverhampton), Johanna Monti (Università degli Studi di Sassari), Gloria Corpas Pastor (Universidad de Málaga) and Violeta Seretan (Université de Genève). The topic of the workshop was the integration of multi-word units in machine translation and translation technology tools. In spite of the relative progress achieved for particular types of units such as verb-particle constructions, the identification, interpretation and translation of multi-word units in general still represent open challenges, both from a theoretical and a practical point of view. The idiosyncratic morpho-syntactic, semantic and translational properties of multi-word units pose many obstacles even to human translators, mainly because of intrinsic ambiguities, structural and lexical asymmetries between languages, and, finally, cultural differences. The aim of the workshop was to bring together researchers and practitioners working on MWU processing from various perspectives, in order to enable cross fertilisation and foster the creation of innovative solutions that can only arise from interdisciplinary collaborations. The present edition of the workshop provided a forum for researchers and practitioners in the fields of (Computational) Linguistics, (Computational) Phraseology, Translation Studies and Translation Technology to discuss recent advances in the area of multi-word unit processing and to coordinate research efforts across disciplines in order to improve the integration of multi-word units in machine translation and translation technology tools. The programme included 5 oral presentations, and featured an invited talk by Carlos Ramisch, Aix-Marseille University, France. The papers accepted are indicative of the current efforts of researchers and developers who are actively engaged in improving the state of the art of multi-word unit translation. We would like to thank all authors who contributed papers to this workshop edition and the Programme Committee members who provided valuable feedback during the review process

    Computational and Corpus-Based Phraseology

    Full text link
    Reseña de Computational and Corpus-Based PhraseologyGloria Corpas Pastor y Ruslan Mitkov (Eds.)Cham (Suiza), Springer, 2022, 252 pp

    Multi-word unit processing in Machine Translation

    No full text
    The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. This volume provides a general overview of the field with particular reference to Machine Translation and Translation Technology and focuses on languages such as English, Basque, French, Romanian, German, Dutch and Croatian, among others. The chapters of the volume illustrate a variety of topics that address this challenge, such as the use of rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues

    A Quantum-Like Approach to Word Sense Disambiguation

    No full text
    This paper presents a novel algorithm for Word Sense Disambiguation (WSD) based on Quantum Probability Theory. The Quantum WSD algorithm requires concepts representations as vectors in the complex domain and thus we have developed a technique for computing complex word and sentence embeddings based on the Paragraph Vectors algorithm. Despite the proposed method is quite simple and that it does not require long training phases, when it is evaluated on a standardized benchmark for this task it exhibits state-of-the-art (SOTA) performances

    Developing a new CAI-tool for RSI Interpreters’ Training: a pilot study

    No full text
    Over the past few years, new technologies in the field of Interpreting have greatly reshaped the way interpreters work, leading to a technological turn in Simultaneous Interpreting (Fantinuoli 2018), due to the increasing use of Remote Simultaneous Interpreting (RSI) and Computer Assisted Interpreting Tools (CAI tools). When there is no human boothmate, AI-based CAI-tools are becoming “artificial boothmates” (Fantinuoli 2017), which support the interpreter before and while they deliver Simultaneous Interpreting services through automatic terminology lookup, key term identification, automatic speech recognition, real-time speech transcription, and number highlighting. While a few researchers have investigated the field of Computer Assisted Interpreting, e.g. Fantinuoli (2017; 2018; 2019), Prandi (2018; 2020), Frittella (2022; 2023) and Defrancq (2020), more research into Computer Assisted Interpreting Training is needed, so that new technologies may be integrated into interpreter training and workflow, given their potential to help interpreters face this technological breakthrough. This pilot study, conducted within the IULM research project “Collaboration for translation and interpreting: tools and teaching applications”, focuses on investigating the training of interpreting students on these new technologies in collaboration with the RSI-platform Converso Education by integrating the RSI-platform with a new CAI tool specifically developed for teaching purposes. To the best of our knowledge, this RSI-platform with CAI tool specifically developed for interpreting students based on their needs is the first of its kind

    v-trel: Vocabulary Trainer for Tracing Word Relations - An Implicit Crowdsourcing Approach

    No full text
    In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to expand ConceptNet can efficiently be gathered through vocabulary exercises on word relations. We also report on the feedback gathered from the users and an expert from language teaching, and discuss the potential of the vocabulary trainer application from the user and language learner perspective. The feedback suggests that v-trel has educational potential, while in its current state some shortcomings could be identified

    Multi-word Units in Machine Translation and Translation Technology

    No full text
    The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. This volume provides a general overview of the field with particular reference to Machine Translation and Translation Technology and focuses on languages such as English, Basque, French Romanian, German, Dutch and Croatian among others. The chapters of the volume illustrate a variety of topics that address this challenge, such as the use of rule-based approaches, compound splitting techniques, MWU identification methodologies in multilingual applications, and MWU alignment issues

    Towards Automatic Annotation of Anaphoric Links in Corpora

    No full text
    The paper proposes a methodology for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust, knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical, syntactical, and even semantic tasks. The second section of the paper outlines the author s robust, knowledge-based approach to pronoun resolution which will subsequently be put forward as the core of a larger architecture proposed for the automatic tagging of referential links. Section 3 briefly presents other related knowledge-poor approaches, while Section 4 discusses the limitations and advantages of the knowledge-poor approach outlined in Section 2. The main argument of the paper is to be found in Section 5, which presents the idea of developing a semi-automatic environment for annotating anaphoric links and outlines the components of such a program. Finally, the conclusion looks at the anticipated success rate of the approach.</jats:p
    corecore