Charles University

Biblio at Institute of Formal and Applied Linguistics

Not a member yet

539 research outputs found

Sort by

HPLT’s First Release of Data and Models

Author: Ramírez-Sánchez Gema
Chen Pinzhen
Helcl Jindřich
Zaragoza-Bernabeu Jaume
Malik Bhavitvya
De Gibert Bonet Ona
Stepachev Pavel
Variš Dušan
Haddow Barry
Arefyev Nikolay
Tiedemann Jörg
Publication venue
Publication date: 01/01/2024
Field of study

The High Performance Language Technologies (HPLT) project is a 3-year EU-funded project that started in September 2022. It aims to deliver free, sustainable, and reusable datasets, models, and workflows at scale using high-performance computing. We describe the first results of the project. The data release includes monolingual data in 75 languages at 5.6T tokens and parallel data in 18 language pairs at 96M pairs, derived from 1.8 petabytes of web crawls. Building upon automated and transparent pipelines, the first machine translation (MT) models as well as large language models (LLMs) have been trained and released. Multiple data processing tools and pipelines have also been made public

Are Large Language Models Actually Good at Text Style Transfer?

Author: Mukherjee Sourabrata
Dušek Ondřej
Ojha Atul
Publication venue
Publication date: 01/01/2024
Field of study

We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali. Text Style Transfer involves modifying the linguistic style of a text while preserving its core content. We evaluate the capabilities of pre-trained LLMs using zero-shot and few-shot prompting as well as parameter-efficient finetuning on publicly available datasets. Our evaluation using automatic metrics, GPT-4 and human evaluations reveals that while some prompted LLMs perform well in English, their performance in on other languages (Hindi, Bengali) remains average. However, finetuning significantly improves results compared to zero-shot and few-shot prompting, making them comparable to previous state-of-the-art. This underscores the necessity of dedicated datasets and specialized models for effective TST

Paragraph Retrieval for Enhanced Question Answering in Clinical Documents

Author: Pecina Pavel
Lanz Vojtěch
Publication venue
Publication date: 01/01/2024
Field of study

Healthcare professionals often manually extract information from large clinical documents to address patient-related questions. The use of Natural Language Processing (NLP) techniques, particularly Question Answering (QA) models, is a promising direction for improving the efficiency of this process. However, document-level QA from large documents is often impractical or even infeasible (for model training and inference). In this work, we solve the document-level QA from clinical reports in a two-step approach: first, the entire report is split into segments and for a given question the most relevant segment is predicted by a NLP model; second, a QA model is applied to the question and the retrieved segment as context. We investigate the effectiveness of heading-based and naive paragraph segmentation approaches for various paragraph lengths on two subsets of the emrQA dataset. Our experiments reveal that an average paragraph length used as a parameter for the segmentation has no significant effect on performance during the whole document-level QA process. That means experiments focusing on segmentation into shorter paragraphs perform similarly to those focusing on entire unsegmented reports. Surprisingly, naive uniform segmentation is sufficient even though it is not based on prior knowledge of the clinical document's characteristics

ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility

Author: Schmidtová Patrícia
Balloccu Simone
Dušek Ondřej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

In this paper, we describe several reproductions of a human evaluation experiment measuring the quality of automatic dialogue summarization (Feng et al., 2021). We investigate the impact of the annotators’ highest level of education, field of study, and native language on the evaluation of the informativeness of the summary. We find that the evaluation is relatively consistent regardless of these factors, but the biggest impact seems to be a prior specific background in natural language processing (as opposed to, e.g. a background in computer science). We also find that the experiment setup (asking for single vs. multiple criteria) may have an impact on the result

Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems

Author: Dušek Ondřej
Warczyński Jędrzej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

We introduce a simple approach that uses a large language model (LLM) to automatically implement a fully interpretable rule-based data-to-text system in pure Python. Experimental evaluation on the WebNLG dataset showed that such a constructed system produces text of better quality (according to the BLEU and BLEURT metrics) than the same LLM prompted to directly produce outputs, and produces fewer hallucinations than a BART language model fine-tuned on the same data. Furthermore, at runtime, the approach generates text in a fraction of the processing time required by neural approaches, using only a single CPU

Představení projektu ELITR

Author: Bojar Ondřej
Macháček Dominik
Publication venue
Publication date: 01/01/2024
Field of study

I presented the result of the EU project ELITR: live speech translation system from 99 to 43 languages

Looking for LLMs' Limits in Dialogue & Data-to-text

Author: Dušek Ondřej
Publication venue
Publication date: 01/01/2024
Field of study

An overview of our recent experiments aiming to find LLMs' limits in the tasks of dialogue modelling and data-to-text generation, including our survey of data leakage in LLMs

Expand Your Color Palette: Evaluating Generated Texts in the Post-BLEU Era

Author: Kasner Zdeněk
Publication venue
Publication date: 01/01/2024
Field of study

The texts we evaluate have become radically different over the past few years. Fluency is no longer an issue and semantic inconsistencies have become more nuanced. As a result, no single number can give us a clear picture of text quality. In this talk, I will present an alternative evaluation approach: annotating ("highlighting") individual text spans with custom categories. The approach combines multiple advantages: it is reference-free, customizable, and produces interpretable and visualizable results. Most importantly, automating and scaling this approach is now possible with LLM-evaluators, i.e., using zero-shot prompted large language models instead of human annotators. As a specific example, I will show how we used the span annotation approach to evaluate LLMs on data-to-text generation. I will also present factgenie: a toolkit we are developing to make this evaluation approach accessible to other researchers

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Author: Gkatzia Dimitra
Howcroft David
Sivaprasad Adarsa
Mahamood Saad
Schmidtová Patrícia
Plátek Ondřej
Dušek Ondřej
Gatt Albert
Balloccu Simone
Publication venue
Publication date: 01/01/2024
Field of study

Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on natural language generation (NLG) tasks. We inspect which metrics are used as well as why they are chosen and how their use is reported. Our findings from this survey reveal significant shortcomings, including inappropriate metric usage, lack of implementation details and missing correlations with human judgements. We conclude with recommendations that we believe authors should follow to enable more rigour within the field

Language Technology Tools and Services

Author: Jánoší Miroslav
Berrìo Aroca Cristian
Callizano Rémi
Roberts Ian
Straka Milan
Galanis Dimitris
Garcia-Silva Andres
Gómez-Pérez José Manuel
Germann Ulrich
Lagzdiņš Andis
Publication venue: Springer Nature Switzerland AG
Publication date: 01/01/2023
Field of study

At the time of writing, the European Language Grid includes more than 800 LT services of varied types, including machine translation (MT), automatic speech recognition (ASR), text-to-speech synthesis (TTS), and text analysis ranging from simple tokenisers and part-of-speech taggers through to complete named entity recognition and sentiment analysis systems. This chapter gives a high-level summary of the development of the ELG service catalogue over time and digs deeper to discuss the process of service integration by looking at a few example services

58

full texts

539

metadata records

Updated in last 30 days.

Biblio at Institute of Formal and Applied Linguistics

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇