1,721,153 research outputs found
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability (LREC-COLING 2024)
This volume includes the papers that were presented at the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability, co-located with the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) in Turin, Italy, on 25 May 2024
Data-driven semantic analysis for multilingual WSD and lexical selection in translation
A common way of describing the senses of ambiguous words in multilingual Word Sense Disambiguation (WSD) is by reference to their translation equivalents in another language. The theoretical soundness of the senses induced in this way can, however, be doubted. This type of cross-lingual sense identification has implications for multilingual WSD and MT evaluation as well. In this article, we first present some arguments in favour of a more
thorough analysis of the semantic information that may be induced by the equivalents of ambiguous words found in parallel corpora. Then, we present an unsupervised WSD
method and a lexical selection method that exploit the results of a data-driven sense induction method. Finally, we show how this automatically acquired information can be
exploited for a multilingual WSD and MT evaluation more sensitive to lexical semantics
CFT13: A Resource for Research into the Post-editing Process
International audienceno abstrac
Evaluating the Effects of Interactivity in a Post-Editing Workbench
International audienceno abstrac
Annotating Qualia Relations in Italian and French Complex Nominals
The goal of this paper is to provide an annotation scheme for compounds based on generative lexicon theory (GL, Pustejovsky, 1995; Bassac and Bouillon, 2001). This scheme has been tested on a set of compounds automatically extracted from the Europarl corpus (Koehn, 2005) both in Italian and French. The motivation is twofold. On the one hand, it should help refine existing compound classifications and better explain lexicalization in both languages. On the other hand, we hope that the extracted generalizations can be used in NLP, for example for improving MT systems or for query reformulation (Claveau, 2003). In this paper, we focus on the annotation scheme and its on going evaluation
The IMAGACT Cross-linguistic Ontology of Action. A new infrastructure for natural language disambiguation
Action verbs, which are highly frequent in speech, cause disambiguation problems that are relevant to Language Technologies. This is a consequence of the peculiar way each natural language categorizes Action i.e. it is a consequence of semantic factors. Action verbs are frequently “general”, since they extend productively to actions belonging to different ontological types. Moreover, each language categorizes action in its own way and therefore the cross-linguistic reference to everyday activities is puzzling. This paper briefly sketches the IMAGACT project, which aims at setting up a cross-linguistic Ontology of Action for grounding disambiguation tasks in this crucial area of the lexicon. The project derives information on the actual variation of action verbs in English and Italian from spontaneous speech corpora, where references to action are high in frequency. Crucially it makes use of the universal language of images to identify action types, avoiding the underdeterminacy of semantic definitions. Action concept entries are prototypic scenes and allow the implementation of all possible languages in the Ontology
Poročilo z delavnice projekta European Language Resources Coordination (ELRC) v Ljubljani (8. 12. 2015)
Delavnica projekta European Language Resources Coordination (ELRC) je potekala 8. decembra 2015 na Institutu »Jožef Stefan« (IJS) v Ljubljani. Organizirala sta jo Center za prenos znanja na področju informacijskih tehnologij ter Laboratorij za umetno inteligenco IJS skupaj s Predstavništvom Evropske komisije v Sloveniji. Nacionalni koordinator dogodka je bil predstavnik ELRC v Sloveniji Simon Krek z IJS, konzorcij ELRC pa je zastopal Stelios Piperidis. Delavnice se je udeležilo 38 udeležencev, večinoma predstavnikov ministrstev in drugih javnih služb, pa tudi računalniški strokovnjaki in samostojni prevajalci. Videoposnetek delavnice in posamezne predstavitve si je mogoče ogledati na portalu Videolectures
The FLaReNet Databook
A collection of all the factual material collected during the activities of the FLaReNet project and a set of innovative initiatives and instruments that will remain in place for the continuous collection of such "facts". Editors: Paola Baroni, Claudia Soria, Nicoletta Calzolari. Contributors: Victoria Arranz, N?ria Bel, Gerhard Budin, Tommaso Caselli, Khalid Choukri, Riccardo Del Gratta, Elina Desypri, Gil Francopoulo, Francesca Frontini, Sara Goggi, Olivier Hamon, Erhard Hinrichs, Penny Labropoulou, Lothar Lemnizer, Steven Krauwer, Valerie Mapelli, Joseph Mariani, Monica Monachini, Jan Odijk, Jungyeul Park, Stelios Piperidis, Adam Przepiorkowski, Valeria Quochi, Eva Revilla, Laurent Romary, Francesco Rubino, Irene Russo, Helmut Schmidt, Hans Uszkoreit, Peter Wittenburg
- …
