Search CORE

1,720,987 research outputs found

Exploring cultural competence in language and multimodal models

Author: Bhatia Mehar
Publication venue
Publication date: 2024
Field of study

This thesis explores the concept of cultural competence and its role in enhancing language and multimodal models to understand and interpret human behaviour across diverse cultural contexts. In response to the increasing need for culturally inclusive models, this work addresses three major research questions by developing new methodologies, metrics, and tools. Firstly, we present GD-COMET, a geo-diverse variant of the COMET model, designed to generate culturally nuanced commonsense inferences. Secondly, we introduce GlobalRG, a benchmark tailored to evaluate the multicultural understanding of vision-language models. This benchmark highlights existing cultural biases and gaps in representation, providing a comprehensive assessment in diverse settings. Furthermore, we introduce CulturalSnap, a large-scale dataset comprising image-text pairs from 50 diverse cultures, and design an approach for inclusive representation learning by leveraging carefully designed contrastive learning objectives to improve model performance across varied cultural contexts. By addressing these critical areas, this work contributes to the development of models that are not only more aware of cultural diversity but also more adept at interacting fairly and effectively with socioculturally diverse audiences. Through these advancements, we aim to foster a more equitable and inclusive AI landscape.Science, Faculty ofComputer Science, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

A bottom-up framework for cross-cultural evaluation of GPT-4o’s social norm biases via implicit narrative invocation

Author: Liu Zhuozhuo
Publication venue
Publication date: 2025
Field of study

Large Language Models (LLMs) have been demonstrated to align with the values of Western or North American cultures. Prior work predominantly showed this effect through leveraging surveys that directly ask – originally people and now also LLMs – about their values. However, it is not clear that these explicitly stated beliefs actually correspond to the slant that LLMs take in real tasks. To address that, we take a bottom-up approach, asking LLMs to recall cultural norms invoked by narratives from different cultures. We find that GPT-4o tends to generate norms that, while not necessarily incorrect, are significantly less culture-specific. In addition, while it avoids overtly generating stereotypes, the stereotypical representations of certain cultures are merely hidden rather than suppressed in the model, and such stereotypes can be easily recovered. Addressing these challenges is a crucial step towards developing LLMs that fairly serve their diverse user base.Science, Faculty ofComputer Science, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository

ROUGE-K: Do your summaries have keywords?

Author: Ponzetto Simone Paolo
Takeshita Sotaro
Eckert Kai
Publication venue
Publication date: 01/01/2024
Field of study

MAnnheim DOCument Server (Univ. Mannheim)

SuperNMT: neural machine translation with semantic supersenses and syntactic supertags

Author: Andy Way
Way Andy
Vanmassenhove Eva
Eva Vanmassenhove
Publication venue
Publication date: 2018
Field of study

In this paper we incorporate semantic supersensetags and syntactic supertag features into EN–FR and EN–DE factored NMT systems. In experiments on various test sets, we observe that such features (and particularly when combined) help the NMT model training to converge faster and improve the model quality according to the BLEU scores

Crossref

Irish Universities

DCU Online Research Access Service

Multi-Level Alignments As An Extensible Representation Basis for Textual Entailment Algorithms

Author: Tae-Gil Noh
Kathrin Eichler
Sebastian Padó
Vivi Nastase
Adler Meni
Shwartz Vered
Eichler Kathrin
Meni Adler
Kotlerman Lili
Padó Sebastian
Nastase Viviana Antonela
Ido Dagan
Noh Tae Gil
Dagan Ido
Vered Shwartz
Lili Kotlerman
Publication venue
Publication date: 01/01/2015
Field of study

A major problem in research on Textual Entailment (TE) is the high implementation effort for TE systems. Recently, interoperable standards for annotation and preprocessing have been proposed. In contrast, the algorithmic level remains unstandardized, which makes component re-use in this area very difficult in practice. In this paper, we introduce multi-level alignments as a central, powerful representation for TE algorithms that encourages modular, reusable, multilingual algorithm development. We demonstrate that a pilot open-source implementation of multi-level alignment with minimal features competes with state-of-theart open-source TE engines in three languages

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Dissertation Abstract:Learning High Precision Lexical Inferences

Author: Shwartz Vered
Publication venue
Publication date: 01/01/2021
Field of study

The fundamental goal of natural language processing is to build models capable of human-level understanding of natural language. One of the obstacles to building such models is lexical variability , i.e. the ability to express the same meaning in various ways. Existing text representations excel at capturing relatedness (e.g. blue / red ), but they lack the fine-grained distinction of the specific semantic relation between a pair of words. This article is a summary of a Ph.D. dissertation submitted to Bar-Ilan University in 2019, under the supervision of Professor Ido Dagan of the Computer Science Department. The dissertation explored methods for recognizing and extracting semantic relationships between concepts ( cat is a type of animal ), the constituents of noun compounds (baby oil is oil for babies), and verbal phrases (‘X died at Y’ means the same as ‘X lived until Y’ in certain contexts). The proposed models outperform highly competitive baselines and improve the state-of-the-art in several benchmarks. The dissertation concludes in discussing two challenges in the way of human-level language understanding: developing more accurate text representations and learning to read between the lines

Digital Library of Gesellschaft für Informatik e.V.

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

The Emergence of High-Level Semantics in a Signaling Game

Author: Mickus Timothee
Takamura Hiroya
Bernard Timothée
Publication venue
Publication date: 01/06/2024
Field of study

The symbol grounding problem---how to connect a symbolic system to the outer world---is a longstanding question in AI that has recently gained prominence with the progress made in NLP in general and surrounding large language models in particular. In this article, we study the emergence of semantic categories in the communication protocol developed by neural agents involved in a well-established type of signaling game. In its basic form, the game requires one agent to retrieve an image based on a message produced by a second agent. We first show that the agents are able to, and do, learn to communicate high-level semantic concepts rather than low-level features of the images even from very indirect training signal to that end. Second, we demonstrate that the introduction of an adversarial agent in the game fosters the emergence of semantics by producing an appropriate training signal when no other method is available.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository