1,721,022 research outputs found

    Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology (MUMTTT 2015)

    No full text
    This volume documents the proceedings of the 2nd Workshop on Multi-word Units in Machine Translation and Translation Technology (MUMTTT 2015), held on 1-2 July 2015 as part of the EUROPHRAS 2015 conference: "Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives" (Málaga, 29 June – 1 July 2015). The workshop was sponsored by European COST Action PARSing and Multi-word Expressions (PARSEME) under the auspices of the European Society of Phraseology (EUROPHRAS), the Special Interest Group on the Lexicon of the Association for Computational Linguistics (SIGLEX), and SIGLEX's Multiword Expressions Section (SIGLEX-MWE). The workshop was co-chaired by Gloria Corpas Pastor (Universidad de Málaga), Ruslan Mitkov (University of Wolverhampton), Johanna Monti (Università degli Studi di Sassari), and Violeta Seretan (Université de Genève). It received the support of the Advisory Board, composed of Dmitrij O. Dobrovol'skij (Russian Academy of Sciences, Moscow), Kathrin Steyer (Institut für Deutsche Sprache, Mannheim), Agata Savary (Université François Rabelais Tours), Michael Rosner (University of Malta), and Carlos Ramisch (Aix-Marseille Université). The topic of the workshop was the integration of multi-word units in machine translation and translation technology tools. In spite of the recent progress achieved in machine translation and translation technology, the identification, interpretation and translation of multi-word units still represent open challenges, both from a theoretical and from a practical point of view. The idiosyncratic morpho-syntactic, semantic and translational properties of multi-word units poses many obstacles even to human translators, mainly because of intrinsic ambiguities, structural and lexical asymmetries between languages, and, finally, cultural differences. After a successful first edition held in Nice on 3 September 2013 as part of the Machine Translation Summit XIV, the present edition provided a forum for researchers working in the fields of Linguistics, Computational Linguistics, Translation Studies and Computational Phraseology to discuss recent advances in the area of multi-word unit processing and to coordinate research efforts across disciplines. The workshop was attended by 53 representatives of academic and industrial organisations. The programme included 11 oral and 4 poster presentations, and featured an invited talk by Kathrin Steyer, President of EUROPHRAS. We received 23 submissions, hence the MUMTTT 2015 acceptance rate was 65.2%. The papers accepted are indicative of the current efforts of researchers and developers who are actively engaged in improving the state of the art of multi-word unit translation

    Multiword units translation evaluation in machine translation: another pain in the neck?

    No full text
    Recent studies have highlighted that the translation of Multiword Units (MWUs) by Machine Translation (MT) is still an open challenge, whatever the adopted approach (statistical, rule-based or example- based). The difficulties in translating automatically this recurrent, complex and varied lexical phenomenon originate from its lexical, syntactic, semantic, pragmatic and/or statistical but also translational idiomaticity. It is widely acknowledged that in order to achieve significant improvements in Machine Translation and translation technologies it is important to develop resources, which can be used both for Statistical Machine Translation (SMT) training and evaluation purposes. There is therefore, the need to develop linguistic re- sources, mainly parallel corpora annotated with MWUs which can help improve the MT quality in particular as regards translation of MWUs in context and discontinuous MWUs. In this paper, we analyse the state of the art concerning MWU-aware MT evaluation metrics, the availability of both benchmarking resources and annotation guidelines and procedures

    MWU processing in an ontology-based CLIR model for specific domain collections

    No full text
    This paper proposes a methodological approach to CLIR applications for the development of a system which improves multi-word processing when specific domain translation is required. The system is based on a multilingual ontology, which can improve both translation and retrieval accuracy and effectiveness. The proposed framework allows mapping data and metadata among language-specific ontologies in the Cultural Heritage (CH) domain. The accessibility of Cultural Heritage resources, as foreseen by recent important initiatives like the European Library and Europeana, is closely related to the development of environments which enable the management of multilingual complexity. Interoperability between multilingual systems can be achieved only by means of an accurate multi-word processing, which leads to a more effective information extraction and semantic search and an improved translation quality

    Multiword units in machine translation and translation technology

    No full text
    The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology

    Development of Post-editing Rules for Improving the Comprehensibility of Machine Translated Scientific Article Abstracts

    No full text
    This research work aims to assess the impact of a set of post-editing rules that have been specifically designed to improve the comprehensibility of Spanish machine translations. For this study we have selected a corpus of scientific article abstracts that have been originally written in English and that have been published on the Internet. As these texts are easily available, we have used Google Translate to obtain our corpora. Our post-editing rules mainly focus on improving the syntax and style of ambiguous sentences that may prevent the general public from understanding the content. In order to evaluate the set of rules, we have carried out a bilingual human evaluation with 5 annotators and we have computed two statistical tests. Results suggest that the set of post-editing rules have improved the raw translators

    MT Summit workshop proceedings for: Multi-word Units in Machine Translation and Translation Technologies (Organised at the 14th Machine Translation Summit)

    No full text
    Machine Translation (MT) has evolved along with different types of computer-assisted translation tools and a notable progress has been achieved in improving the quality of translations. However, in spite of the recent positive developments in translation technologies, not all problems have been solved and in particular the identification, interpretation and translation of multi-word units (MWUs) still represent open challenges, both from a theoretical and a practical point of view. The low standard of analysis and translation of MWUs in translation technologies suggest that there is the need to invest in further research with the goal of improving the performance of the various translation applications. Multi-word units (MWUs) are a complex linguistic phenomenon, ranging from lexical units with a relatively high degree of internal variability to expressions that are frozen or semi-frozen. Such units are very frequent both in everyday language and in languages for special purposes. Their interpretation and translation sometimes present unexpected obstacles even to human translators, mainly because of intrinsic ambiguities, structural and lexical asymmetries between languages, and, finally, cultural differences. The current theoretical work on this topic deals with different formalisms and techniques relevant for MWU processing in MT as well as other translation applications, such as: automatic recognition of MWUs in a monolingual or bilingual setting; alignment and paraphrasing methodologies; development, features and usefulness of handcrafted monolingual and bilingual linguistic resources and grammars; use of MWUs in Statistical Machine Translation (SMT) domain adaptation, as well as empirical work concerning their modelling accuracy and descriptive adequacy across various language pairs. At the practical level, the issue of MWU has been addressed in various MT approaches, whether knowledge-based, statistical (word-based, phrase-based or factored-based) or hybrid. In general, MWU identification and translation problems are far from being solved and there is still considerable room for improvement. There is a recent growing attention to MWU processing in MT and Translation Technologies, as it has been acknowledged that it is not possible to create large-scale applications without properly handling MWUs of all kinds. The focus of this workshop is to address the MWU issue in a synergetic way, taking advantage of the recent developments in disciplines such as Linguistics, Translation Studies, Computational Linguistics, and Computational Phraseology. The main aim of the Workshop is, therefore, to bring together researchers working on various aspects of MWU processing in different disciplines, in order to discuss and propose innovative ideas and methods in relation to MT and Translation Technologies. In particular, this workshop welcomes the exchange of interactions between researchers in NLP working on the computational treatment of multi-word units, experts in phraseology (including computational phraseology) working on challenging topics of their discipline, as well as translation practitioners, to the benefit of applying their latest results to advance the state of the art in MWU translation

    Bridging Collocational and Syntactic Analysis

    No full text
    The advent of the computer era, which enabled the development of large text corpora and of sophisticated corpus processing tools, led to unprecedented advances in the area of collocational analysis. These advances were paralleled by significant achievements in the area of syntactic analysis, with parsing technologies becoming available for an increasing number of languages. But more often than not, these developments have taken place independently. The coupling of collocational and syntactic analyses has seldom been considered, despite the fact that one type of analysis could benefit the other. In this chapter, we focus on the integration of syntactic parsing and collocational analysis. First, we review the literature describing syntactically-informed approaches to collocation extraction. Second, we survey the work devoted to exploiting collocational resources for syntactic parsing. Finally, we refer to more recent work that proposes a joint approach to collocational and syntactic analysis, arguing that the two analyses are interdependent to such a degree that only a simultaneous process, one in which structure decoding and pattern identification go hand in hand, can provide a solid bridge between them

    When Multiwords Go Bad in Machine Translation

    No full text
    This paper addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for MT independently of the approach, and require adequate linguistic quality evaluation metrics founded on a systematic categorization of errors by MT expert linguists. We propose an empirically-driven taxonomy for multiwords, and highlight the need for the development of specific corpora for multiword evaluation. Finally, the paper presents the Logos approach to multiword processing, illustrating how semantico-syntactic rules contribute to multiword translation quality
    corecore