Search CORE

1,721,179 research outputs found

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

Author: Vlachos Andreas
Dodge Jesse
Peters Matthew
Augenstein Isabelle
Chronopoulou Alexandra
Fraser Alexander
Publication venue
Publication date: 01/01/2023
Field of study

Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains

Open Access LMU ( Ludwig-Maximilians-Univ. München)

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Author: Friedrich Annemarie
Henning Sophie
Beluch William
Vlachos Andreas
Augenstein Isabelle
Fraser Alexander
Publication venue
Publication date: 01/01/2023
Field of study

Many natural language processing (NLP) tasks are naturally imbalanced, as some target categories occur much more frequently than others in the real world. In such scenarios, current NLP models tend to perform poorly on less frequent classes. Addressing class imbalance in NLP is an active research topic, yet, finding a good approach for a particular task and imbalance scenario is difficult. In this survey, the first overview on class imbalance in deep-learning based NLP, we first discuss various types of controlled and real-world class imbalance. Our survey then covers approaches that have been explicitly proposed for class-imbalanced NLP tasks or, originating in the computer vision community, have been evaluated on them. We organize the methods by whether they are based on sampling, data augmentation, choice of loss function, staged learning, or model design. Finally, we discuss open problems and how to move forward

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking

Author: Mizzaro Stefano
Roitero Kevin
Soprano Michael
Augenstein Isabelle
Wright Dustin
Publication venue
Publication date: 01/01/2025
Field of study

Assessing the truthfulness of information is a critical task in fact-checking, and is typically performed using binary or coarse ordinal scales (2-6 levels), though fine-grained scales (e.g., 100 levels) have also been explored. Magnitude Estimation (ME) takes this approach further by allowing assessors to assign any value in the range (0, + ∞). However, it introduces challenges, including the need for aggregation of assessments from individuals with different interpretations of the scale. Despite these, its successful applications in other domains suggest its potential suitability for truthfulness assessment. We conduct a crowdsourcing study by collecting assessments on claims sourced from the PolitiFact fact-checking organization using ME. To the best of our knowledge, this is the first systematic investigation of ME in the context of truthfulness assessment. Our results show that while aggregation methods significantly impact assessment quality, optimal aggregation strategies yield accuracy and reliability comparable to traditional scales. More importantly, ME allows capturing subtle differences in truthfulness, offering richer insights than conventional coarse-grained scales

Archivio istituzionale della ricerca - Università degli Studi di Udine

Copenhagen University Research Information System

Guide the Learner

Author: Amirkhani Hossein
Vlachos Andreas
Modarressi Ali
Augenstein Isabelle
Pilehvar Mohammad Taher
Publication venue
Publication date: 01/01/2023
Field of study

Several proposals have been put forward in recent years for improving out-of-distribution (OOD) performance through mitigating dataset biases. A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model. Here, the underlying assumption is that the biased model resorts to shortcut features. Hence, those training examples that are correctly predicted by the biased model are flagged as being biased and are down-weighted during the training of the main model. However, assessing the importance of an instance merely based on the predictions of the biased model may be too naive. It is possible that the prediction of the main model can be derived from another decision-making process that is distinct from the behavior of the biased model. To circumvent this, we introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts (PoE) loss function to further improve OOD performance. With experiments conducted on natural language inference and fact verification benchmarks, we show that our method improves OOD results while maintaining in-distribution (ID) performance

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Author: Ostendorff Malte
Rethmeier Nils
Rehm Georg
Augenstein Isabelle
Gipp Bela
Publication venue
Publication date: 14/02/2022
Field of study

Learning scientific document representations can be substantially improved through contrastive learning objectives, where the challenge lies in creating positive and negative training samples that encode the desired similarity semantics. Prior work relies on discrete citation relations to generate contrast samples. However, discrete citations enforce a hard cut-off to similarity. This is counter-intuitive to similarity-based learning, and ignores that scientific papers can be very similar despite lacking a direct citation - a core problem of finding related research. Instead, we use controlled nearest neighbor sampling over citation graph embeddings for contrastive learning. This control allows us to learn continuous similarity, to sample hard-to-learn negatives and positives, and also to avoid collisions between negative and positive samples by controlling the sampling margin between them. The resulting method SciNCL outperforms the state-of-the-art on the SciDocs benchmark. Furthermore, we demonstrate that it can train (or tune) models sample-efficiently, and that it can be combined with recent training-efficient methods. Perhaps surprisingly, even training a general-domain language model this way outperforms baselines pretrained in-domain

GRO.publications (Univ. Göttingen)

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Author: Ostendorff Malte
Rethmeier Nils
Rehm Georg
Augenstein Isabelle
Gipp Bela
Publication venue
Publication date: 14/02/2022
Field of study

GRO.publications

Vote’n’Rank: Revision of Benchmarking with Social Choice Theory

Author: Mikhailov Vladislav
Shavrina Tatiana
Tutubalina Elena
Vlachos Andreas
Kravchenko Andrey
Karabekyan Daniel
Artemova Ekaterina
Augenstein Isabelle
Rofin Mark
Florinsky Mikhail
Publication venue
Publication date: 01/01/2023
Field of study

The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives. Although the paradigm is shifting towards more fine-grained evaluation across diverse tasks, the delicate question of how to aggregate the performances has received particular interest in the community. In general, benchmarks follow the unspoken utilitarian principles, where the systems are ranked based on their mean average score over task-specific metrics. Such aggregation procedure has been viewed as a sub-optimal evaluation protocol, which may have created the illusion of progress. This paper proposes Vote’n’Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory. We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields and identify the best-performing systems in research and development case studies. The Vote’n’Rank’s procedures are more robust than the mean average while being able to handle missing performance scores and determine conditions under which the system becomes the winner

Open Access LMU ( Ludwig-Maximilians-Univ. München)

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?

Author: Isabelle Augenstein
Samory Mattia
Mattia Samory
Fabian Flöck
Wagner Claudia
Flöck Fabian
Sen Indira
Augenstein Isabelle
Indira Sen
Claudia Wagner
Publication venue
Publication date: 01/01/2021
Field of study

As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data have shown promising out-of-domain generalizability, it is still unclear what the sources of such improvements are. We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs — sentiment, sexism, and hate speech. Assessing the performance of models trained with and without CAD across different types of datasets, we find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain. We unpack this apparent discrepancy using machine explanations and find that CAD reduces model reliance on spurious features. Leveraging a novel typology of CAD to analyze their relationship with model performance, we find that CAD which acts on the construct directly or a diverse set of CAD leads to higher performance

Crossref

Copenhagen University Research Information System

Publikationsserver der RWTH Aachen University

Archivio della ricerca- Università di Roma La Sapienza

Social Bias Probing:Fairness Benchmarking for Language Models

Author: Manerba Marta Marchiori
Guidotti Riccardo
Marchiori Manerba Marta
Augenstein Isabelle
Stańczak Karolina
Publication venue
Publication date: 01/01/2024
Field of study

While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SOFA, a large-scale benchmark designed to address the limitations of existing fairness collections. SOFA expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SOFA, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.</p

Archivio della Ricerca - Università di Pisa

Copenhagen University Research Information System

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)