Charles University

Biblio at Institute of Formal and Applied Linguistics

Not a member yet

539 research outputs found

Sort by

Jak se Mojžíš s Jozuem učili hindsky (aneb souboj s mizernými daty při překladu z angličtiny do hindštiny)

Author: Bojar Ondřej
Zeman Daniel
Straňák Pavel
Publication venue
Publication date: 01/01/2009
Field of study

A Contrastive Lexical description of Basic Verbs. Examples from Swedish and Czech.

Author: Cinková Silvie
Publication venue
Publication date: 01/01/2009
Field of study

This paper aims at a lexical description of frequent, but not enough cognitively salient uses of frequent lexical verbs in Swedish on the background of Czech, with some implications for the lexical description of basic verbs in general. It results in a draft of a production lexicon of Swedish basic verbs for advanced Czech learners of Swedish, with focus on their uses as light verbs

Statistical Machine Translation between Related and Unrelated Languages

Author: Kolovratník David
Bojar Ondřej
Klyueva Natalia
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we describe an attempt to compare how relatedness of languages can influence the performance of statistical machine translation (SMT). We apply the Moses toolkit on the Czech-English-Russian corpus UMC 0.1 in order to train two translation systems: Russian-Czech and English-Czech. The quality of the translation is evaluated on an independent test set of 1000 sentences parallel in all three languages using an automatic metric (BLEU score) as well as manual judgments. We examine whether the quality of Russian-Czech is better thanks to the relatedness of the languages and similar characteristics of word order and morphological richness. Additionally, we present and discuss the most frequent translation errors for both language pairs

Computer Aided Translation Backed by Machine Translation

Author: Odcházel Ondřej
Bojar Ondřej
Publication venue
Publication date: 01/01/2009
Field of study

A number of tools to support translators (computer-aided translation, CAT) exist, as there are many systems of machine translation (MT). So far, the integration of the two system types was little or none. The aim of this paper is to examine a tighter coupling of MT and CAT. We introduce our web-based CAT tool implemented using the modern AJAX technology that communicates with Moses MT system on the server side to provide the translator with suggested translations of individual phrases of the source sentence as well as several options of the complete continuation of the output sentence. The suggested continuation is based on what has been already translated and what the user has already written as the output. Hopefully, the proposed user interface and the MT system at the back end will accelerate and simplify the process of translation

CzEng 0.9, Building a Large Czech-English Automatic Parallel Treebank

Author: Bojar Ondřej
Žabokrtský Zdeněk
Publication venue
Publication date: 01/01/2009
Field of study

We describe our ongoing efforts in collecting a Czech-English parallel corpus CzEng. The paper provides full details on the current version~0.9 and focuses on its new features: (1) data from new sources were added, most importantly a few hundred electronically available books, technical documentation and also some parallel web pages, (2) the full corpus has been automatically annotated up to the tectogrammatical layer (surface and deep syntactic analysis), (3) sentence segmentation has been refined, and (4) several heuristic filters to improve corpus quality were implemented. In total, we provide a sentence-aligned automatic parallel treebank of 8.0 million sentences, 93 English and 82 Czech words. CzEng~0.9 is freely available for non-commercial research purposes

Získávání paralelních textů z webu

Author: Ehrenberger Jan
Novák Michal
Bojar Ondřej
Fabian Peter
Klempová Hana
Publication venue
Publication date: 01/01/2009
Field of study

We examine methods for collecting parallel Czech-English corpora from the web. We propose and evaluate automatic methods for finding source web sites, language identification and most importantly the document alignment of obtained pages

Annotation Quality Checking and Its Implications for Design of Treebank (in Building the Prague Czech-English Dependency Treebank)

Author: Štěpánek Jan
Mikulová Marie
Publication venue
Publication date: 01/01/2009
Field of study

The article presents the system for annotation quality checking, proposed and used during the building of the Czech part of the Prague Czech-English Dependency Treebank. At first, the treebank project is introduced, as well as its basic principles and annotation process. The second part of the article pursues in detail one of the important phases of the annotation process, namely how the correctness of the annotated data is automatically and continuously checked during the process. The system of annotation quality checking is demonstrated on several particular checking procedures concerning syntactical phenomena. We try to evaluate the contribution of the system not only to the quality of the data and annotation, but also to the corpus design, impact on annotation rules and the annotation process as a whole

Tectogrammatical Annotation of the Wall Street Journal

Author: Klimeš Václav
Šindlerová Jana
Mladová Lucie
Hajič Jan
Cinková Silvie
Žabokrtský Zdeněk
Tomšů Kristýna
Čermáková Kristýna
Toman Josef
Publication venue
Publication date: 01/01/2009
Field of study

This paper gives an overview of the current state of the Prague English Dependency Treebank project. It is an updated version of a draft text that was released along with a CD presenting the first 25\% of the PDT-like version of the Penn Treebank -- WSJ section (PEDT 1.0)

Towards English-Czech Parallel Valency Lexicon via Treebank Examples

Author: Šindlerová Jana
Bojar Ondřej
Publication venue
Publication date: 01/01/2009
Field of study

The paper describes an ongoing project of building a bilingual valency lexicon in the framework of Functional Generative Description. The bilingual lexicon is designed as a result of interlinking frames and frame elements of two already existing valency lexicons. First, we give an overall account of the character of the lexicons to be linked, second, the process of frame linking is explained, and third, a case study is presented to exemplify what the information contained in frame links tells us about crosslinguistic differences in general and the linguistic theory applied

58

full texts

539

metadata records

Updated in last 30 days.

Biblio at Institute of Formal and Applied Linguistics

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇