Charles University

Biblio at Institute of Formal and Applied Linguistics
Not a member yet
    539 research outputs found

    Towards Semantic Tagging of Segmented Holocaust Narratives

    No full text
    With the increasing loss of Holocaust witnesses, it is becoming more and more important to preserve their memories. Items of cultural heritage, including textual data such as diaries or transcripts of video interviews, are abundant. However, large amounts of this data are not annotated, which poses a significant obstacle for domain experts curating digitized information regarding the Holocaust. A solution for this problem is a natural language processing model that links text segments to a rich domain-specific ontology of subject terms to automatically tag documents for further processing. While we have not yet achieved a comprehensive solution, we show that even a simple model fine-tuned on a small dataset of spoken narratives is a promising first step and transfers its capabilities to written testimonies reasonably well

    Exploring ReAct Prompting for Task-Oriented Dialogue: Insights and Shortcomings

    No full text
    Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) (Yao et al., 2022) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing task-oriented dialogue (TOD). We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs severely underperform state-of-the-art approaches on success rate in simulation, this difference becomes less pronounced in human evaluation. Moreover, compared to the baseline, humans report higher subjective satisfaction with ReAct-LLM despite its lower success rate, most likely thanks to its natural and confidently phrased responses

    When Multilingual Models Compete with Monolingual Domain-Specific Models in Clinical Question Answering

    No full text
    This paper explores the performance of multilingual models in the general domain on the clinical Question Answering (QA) task to observe their potential medical support for languages that do not benefit from the existence of clinically trained models. In order to improve the model’s performance, we exploit multilingual data augmentation by translating an English clinical QA dataset into six other languages. We propose a translation pipeline including projection of the evidences (answers) into the target languages and thoroughly evaluate several multilingual models fine-tuned on the augmented data, both in mono- and multilingual settings. We find that the translation itself and the subsequent QA experiments present a differently challenging problem for each of the languages. Finally, we compare the performance of multilingual models with pretrained medical domain-specific English models on the original clinical English test set. Contrary to expectations, we find that monolingual domain-specific pretraining is not always superior to general-domain multilingual pretraining. The source code is available at https://github.com/lanzv/Multilingual-emrQ

    Large Language Models: How they work and what they are good for

    No full text
    A short introduction explaining the working of large language models and potential caveats of their usage

    Constraining LLM Output

    No full text
    This talk shows practical ways to make LLMs follow exact formats — from regex and JSON schemas to token-aware FSMs and CFGs — and explains how those constraints work during decoding. It surveys current tools and implementations, points out pitfalls like tokenization mismatches and unnatural formats, and gives overview of best practices, focusing on MT use cases. A short demo demonstrates constrained decoding in action and common failure modes to watch for

    Real-World Summarization: When Evaluation Reaches Its Limits

    No full text
    We examine evaluation of faithfulness to input data in the context of hotel highlights—brief LLM-generated summaries that capture unique features of accommodations. Through human evaluation campaigns involving categorical error assessment and span-level annotation, we compare traditional metrics, trainable methods, and LLM-as-a-judge approaches. Our findings reveal that simpler metrics like word overlap correlate surprisingly well with human judgments (r=0.63), often outperforming more complex methods when applied to outof- domain data. We further demonstrate that while LLMs can generate high-quality highlights, they prove unreliable for evaluation as they tend to severely under- or over-annotate. Our analysis of real-world business impacts shows incorrect and non-checkable information pose the greatest risks. We also highlight challenges in crowdsourced evaluations

    Evaluating LLM Outputs with Humans and LLMs

    No full text
    How well do LLMs perform on text generation tasks, and how can we tell? We present approaches based on annotating individual errors, using human evaluators as well as LLMs. For humans, we introduce our efficient annotation framework and schema. For LLM-based evaluation, we show a metric using an ensemble of open-source LLMs, which includes a reasoning for each annotated error, evaluated on various generation tasks and evaluation aspects (such as accuracy or fluency) and showing high correlation with human annotators. Both approaches allow us to use benchmarks with recent data unseen to LLMs during training, bypassing the data leakage problem that artificially inflates LLMs' performance on commonly used benchmarks

    HPLT’s Second Data Release

    No full text
    We describe the progress of the High Performance Language Technologies (HPLT) project, a 3-year EU-funded project that started in September 2022. We focus on the up-to-date results on the release of free text datasets derived from web crawls, one of the central objectives of the project. The second release used a revised processing pipeline, and an enlarged set of input crawls. From 4.5 petabytes of web crawls we extracted 7.6T tokens of monolingual text in 193 languages, plus 380 million parallel sentences in 51 language pairs. We also release MultiHPLT, a cross-combination of the parallel data, which produces 1,275 pairs, as well as releasing the containing documents for all parallel sentences in order to enable research in document-level MT. We report changes in the pipeline, analysis and evaluation results for the second parallel data release based on machine translation systems. All datasets are released under a permissive CC0 licence

    Jak funguje dnešní AI a k čemu (ne)může být

    No full text
    Artificial intelligence (AI) has become ubiquitous in recent years and provides answers to any question, but the quality of those answers varies considerably. In this article, I would first like to show why this is the case, or rather how large language models (LLMs), on which today's AI is based, work. I will then focus on the question of what AI can be used for when working with text, and I will show several examples of possible inputs

    Can Large Language Models Personalize Dialogues to Generational Styles?

    No full text
    We investigate how large language models (LLMs) can produce personalized dialogue responses, specifically focusing on whether they reflect linguistic styles pertaining to different generations: Baby Boomers, Generation X, Generation Y, and Generation Z. We create P-MultiWoZ, a personalized, generation-specific version of MultiWOZ 2.2, by prompting LLMs, and validate its alignment with the original dataset through automatic and human evaluations. To validate the appropriateness of generational linguistic traits, we introduce GeMoSC, a corpus of generation-annotated movie dialogues. Linguistic analysis and perplexity test suggest that P-MultiWoZ reflects patterns consistent with GeMoSC. Finally, a human evaluation reveals that annotators were able to mostly correctly identify the generation behind P-MultiWoZ dialogues, based only on a single query-reply pair

    58

    full texts

    539

    metadata records
    Updated in last 30 days.
    Biblio at Institute of Formal and Applied Linguistics
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇