Charles University

Biblio at Institute of Formal and Applied Linguistics
Not a member yet
    539 research outputs found

    HPLT’s First Release of Data and Models

    No full text
    The High Performance Language Technologies (HPLT) project is a 3-year EU-funded project that started in September 2022. It aims to deliver free, sustainable, and reusable datasets, models, and workflows at scale using high-performance computing. We describe the first results of the project. The data release includes monolingual data in 75 languages at 5.6T tokens and parallel data in 18 language pairs at 96M pairs, derived from 1.8 petabytes of web crawls. Building upon automated and transparent pipelines, the first machine translation (MT) models as well as large language models (LLMs) have been trained and released. Multiple data processing tools and pipelines have also been made public

    Are Large Language Models Actually Good at Text Style Transfer?

    No full text
    We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali. Text Style Transfer involves modifying the linguistic style of a text while preserving its core content. We evaluate the capabilities of pre-trained LLMs using zero-shot and few-shot prompting as well as parameter-efficient finetuning on publicly available datasets. Our evaluation using automatic metrics, GPT-4 and human evaluations reveals that while some prompted LLMs perform well in English, their performance in on other languages (Hindi, Bengali) remains average. However, finetuning significantly improves results compared to zero-shot and few-shot prompting, making them comparable to previous state-of-the-art. This underscores the necessity of dedicated datasets and specialized models for effective TST

    Paragraph Retrieval for Enhanced Question Answering in Clinical Documents

    No full text
    Healthcare professionals often manually extract information from large clinical documents to address patient-related questions. The use of Natural Language Processing (NLP) techniques, particularly Question Answering (QA) models, is a promising direction for improving the efficiency of this process. However, document-level QA from large documents is often impractical or even infeasible (for model training and inference). In this work, we solve the document-level QA from clinical reports in a two-step approach: first, the entire report is split into segments and for a given question the most relevant segment is predicted by a NLP model; second, a QA model is applied to the question and the retrieved segment as context. We investigate the effectiveness of heading-based and naive paragraph segmentation approaches for various paragraph lengths on two subsets of the emrQA dataset. Our experiments reveal that an average paragraph length used as a parameter for the segmentation has no significant effect on performance during the whole document-level QA process. That means experiments focusing on segmentation into shorter paragraphs perform similarly to those focusing on entire unsegmented reports. Surprisingly, naive uniform segmentation is sufficient even though it is not based on prior knowledge of the clinical document's characteristics

    ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility

    No full text
    In this paper, we describe several reproductions of a human evaluation experiment measuring the quality of automatic dialogue summarization (Feng et al., 2021). We investigate the impact of the annotators’ highest level of education, field of study, and native language on the evaluation of the informativeness of the summary. We find that the evaluation is relatively consistent regardless of these factors, but the biggest impact seems to be a prior specific background in natural language processing (as opposed to, e.g. a background in computer science). We also find that the experiment setup (asking for single vs. multiple criteria) may have an impact on the result

    Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems

    No full text
    We introduce a simple approach that uses a large language model (LLM) to automatically implement a fully interpretable rule-based data-to-text system in pure Python. Experimental evaluation on the WebNLG dataset showed that such a constructed system produces text of better quality (according to the BLEU and BLEURT metrics) than the same LLM prompted to directly produce outputs, and produces fewer hallucinations than a BART language model fine-tuned on the same data. Furthermore, at runtime, the approach generates text in a fraction of the processing time required by neural approaches, using only a single CPU

    Představení projektu ELITR

    No full text
    I presented the result of the EU project ELITR: live speech translation system from 99 to 43 languages

    Looking for LLMs' Limits in Dialogue & Data-to-text

    No full text
    An overview of our recent experiments aiming to find LLMs' limits in the tasks of dialogue modelling and data-to-text generation, including our survey of data leakage in LLMs

    Expand Your Color Palette: Evaluating Generated Texts in the Post-BLEU Era

    No full text
    The texts we evaluate have become radically different over the past few years. Fluency is no longer an issue and semantic inconsistencies have become more nuanced. As a result, no single number can give us a clear picture of text quality. In this talk, I will present an alternative evaluation approach: annotating ("highlighting") individual text spans with custom categories. The approach combines multiple advantages: it is reference-free, customizable, and produces interpretable and visualizable results. Most importantly, automating and scaling this approach is now possible with LLM-evaluators, i.e., using zero-shot prompted large language models instead of human annotators. As a specific example, I will show how we used the span annotation approach to evaluate LLMs on data-to-text generation. I will also present factgenie: a toolkit we are developing to make this evaluation approach accessible to other researchers

    Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

    No full text
    Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on natural language generation (NLG) tasks. We inspect which metrics are used as well as why they are chosen and how their use is reported. Our findings from this survey reveal significant shortcomings, including inappropriate metric usage, lack of implementation details and missing correlations with human judgements. We conclude with recommendations that we believe authors should follow to enable more rigour within the field

    Language Technology Tools and Services

    No full text
    At the time of writing, the European Language Grid includes more than 800 LT services of varied types, including machine translation (MT), automatic speech recognition (ASR), text-to-speech synthesis (TTS), and text analysis ranging from simple tokenisers and part-of-speech taggers through to complete named entity recognition and sentiment analysis systems. This chapter gives a high-level summary of the development of the ELG service catalogue over time and digs deeper to discuss the process of service integration by looking at a few example services

    58

    full texts

    539

    metadata records
    Updated in last 30 days.
    Biblio at Institute of Formal and Applied Linguistics
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇