Charles University

Biblio at Institute of Formal and Applied Linguistics

Not a member yet

539 research outputs found

Sort by

Large Language Models in Chatbot Applications

Author: Dušek Ondřej
Publication venue
Publication date: 01/01/2024
Field of study

A short description of the usage of LLMs in task-oriented dialogue, detailing the potential problems and a proposed solution. The presentation included a short demonstration of our recent LLM-based dialogue system

Leveraging Large Language Models for Building Interpretable Rule-Based Data-to-Text Systems

Author: Dušek Ondřej
Warczyński Jędrzej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

We introduce a simple approach that uses a large language model (LLM) to automatically implement a fully interpretable rule-based data-to-text system in pure Python. Experimental evaluation on the WebNLG dataset showed that such a constructed system produces text of better quality (according to the BLEU and BLEURT metrics) than the same LLM prompted to directly produce outputs, and produces fewer hallucinations than a BART language model fine-tuned on the same data. Furthermore, at runtime, the approach generates text in a fraction of the processing time required by neural approaches, using only a single CPU

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Author: Dušek Ondřej
Kumar Nalin
Publication venue
Publication date: 01/01/2024
Field of study

Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While entrainment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue entrainment in a GPT-2-based end-to-end task-oriented dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, entrainment-specific loss, and additional conditioning to generate responses that align with the user. We demonstrate that all three approaches produce significantly better entrainment than the base, non-entrainment-optimized model, as confirmed by both automated and manual evaluation metrics

Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Author: Dušek Ondřej
Kasner Zdeněk
Publication venue
Publication date: 01/01/2024
Field of study

We analyze the behaviors of open large language models (LLMs) on the task of data-to-text (D2T) generation, i.e., generating coherent and relevant text from structured data. To avoid the issue of LLM training data contamination with standard benchmarks, we design QUINTD – a tool for collecting novel structured data records from public APIs. We find that open LLMs (Llama 2, Mistral, and Zephyr) can generate fluent and coherent texts in zero-shot settings from data in common formats collected with QUINTD. However, we show that the semantic accuracy of the outputs is a major issue: both according to human annotators and our reference-free metric based on GPT-4, more than 80% of the outputs of open LLMs contain at least one semantic error. We publicly release the code, data, and model outputs

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Author: Schmidtová Patrícia
Balloccu Simone
Dušek Ondřej
Lango Mateusz
Publication venue
Publication date: 01/01/2024
Field of study

Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the problem of indirect data leaking, where models are iteratively improved by using data coming from users. In this work, we conduct the first systematic analysis of work using OpenAI’s GPT-3.5 and GPT-4, the most prominently used LLMs today, in the context of data contamination. By analysing 255 papers and considering OpenAI’s data usage policy, we extensively document the amount of data leaked to these models during the fi rst year after the model’s release. We report that these models have been globally exposed to ∼4.7M samples from 263 benchmarks. At the same time, we document a number of evaluation malpractices emerging in the reviewed papers, such as unfair or missing baseline comparisons and reproducibility issues. We release our results as a collaborative project on https://leak-llm.github.io/, where other researchers can contribute to our efforts

Large Language Models: How they work and what they are good for

Author: Dušek Ondřej
Publication venue
Publication date: 01/01/2024
Field of study

A brief explanation of how LLMs work and what they should and shouldn't be used for, including a showcase of failure cases and potential risks

UFAL Speech Corpus of North Levantine Arabic 1.0 - Part 2

Author: Pecina Pavel
Pospíšil Adam
Krubiński Mateusz
Zemánek Petr
Sellat Hashem
Publication venue
Publication date: 01/01/2024
Field of study

The corpus contains recordings by the native speakers of the North Levantine Arabic (apc) acquired during 2020, 2021, and 2023 in Prague, Paris, Kabardia, and St. Petersburg. The data provided in this repository corresponds to the test split of the dialectal Arabic to English shared task hosted at the 21st edition of the International Conference on Spoken Language Translation, i.e., IWSLT 2024

Getting Structure in Dialogue with Large Language Models

Author: Dušek Ondřej
Publication venue
Publication date: 01/01/2024
Field of study

An introduction into LLM workings and problems as well as an overview of recent experiments with using LLMs to model and evaluate dialogue

Data-to-Text Generation with Neural Language Models

Author: Kasner Zdeněk
Publication venue
Publication date: 01/01/2024
Field of study

Data-to-text generation systems need to produce texts with high levels of semantic accuracy. Rule-based systems can guarantee this aspect, but their fluency and adaptability to new domains remain limited. Meanwhile, neural language models can easily generate fluent texts and adapt to new domains but are notoriously prone to producing inaccurate outputs. This thesis explores how to efficiently employ neural components in data-to-text generation systems to get the best of both worlds. We focus on approaches based on pretrained transformer language models. Primarily, the models serve as building blocks for data-efficient and robust data-to-text generation systems. Along with that, we introduce model-based evaluation metrics, focusing on detecting errors in data-to-text outputs, and a toolkit for preprocessing and visualizing data-to-text generation datasets. We also analyze the behavior of pretrained and large language models in specific scenarios, including describing individual relations in knowledge graphs and generating texts from standard data formats. We conclude that while employing neural language models in data-to-text generation remains a delicate endeavor, neural components can improve the fluency of the output texts and make the systems adaptable to new domains. At the same time, the semantic accuracy of the outputs can remain high if the models are used for specific, well-defined subtasks for improving text quality. For future research, we emphasize the need for benchmarking with suitable evaluation metrics on real-world use cases

Ask the experts: sourcing a high-quality nutrition counseling dataset through Human-AI collaboration

Author: Sargsyan Rafael
Kumar Vivek
Reforgiato Recupero Diego
Riboni Daniele
Li Karen
Dušek Ondřej
Balloccu Simone
Reiter Ehud
Publication venue
Publication date: 01/01/2024
Field of study

Large Language Models (LLMs) are being employed by end-users for various tasks, including sensitive ones such as health counseling, disregarding potential safety concerns. It is thus necessary to understand how adequately LLMs perform in such domains. We conduct a case study on ChatGPT in nutrition counseling, a popular use-case where the model supports a user with their dietary struggles. We crowdsource real-world diet-related struggles, then work with nutrition experts to generate supportive text using ChatGPT. Finally, experts evaluate the safety and text quality of ChatGPT’s output. The result is the HAI-Coaching dataset, containing ~2.4K crowdsourced dietary struggles and ~97K corresponding ChatGPT-generated and expert-annotated supportive texts. We analyse ChatGPT’s performance, discovering potentially harmful behaviours, especially for sensitive topics like mental health. Finally, we use HAI-Coaching to test open LLMs on various downstream tasks, showing that even the latest models struggle to achieve good performance. HAI-Coaching is available at https://github.com/uccollab/hai-coaching/

58

full texts

539

metadata records

Updated in last 30 days.

Biblio at Institute of Formal and Applied Linguistics

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇