Association for the Advancement of Artificial Intelligence: AAAI Publications

Not a member yet

26155 research outputs found

Sort by

Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English

Author: Anand Avinash
Prasad Kritarth
Kirtani Chhavi
Nair Ashwin R
Nema Manvendra Kumar
Jaiswal Raj
Shah Rajiv Ratn
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Large Language Models (LLMs) excel in linguistic tasks but struggle with mathematical reasoning, particularly in non- English languages like Hindi. This research aims to en- hance the mathematical reasoning skills of smaller, resource- efficient open-source LLMs in both Hindi and English. We evaluate models like OpenHathi 7B, LLaMA-2 7B, Wizard- Math 7B, Mistral 7B, LLeMMa 7B, MAmmoTH 7B, Gemini Pro, and GPT-4 using zero-shot, few-shot chain-of-thought (CoT) methods, and supervised fine-tuning. Our approach in- corporates curriculum learning, progressively training mod- els on increasingly difficult problems, a novel Decompo- sition Strategy to simplify complex arithmetic operations, and a Structured Solution Design that divides solutions into phases. Our experiments result in notable performance en- hancements. WizardMath 7B exceeds Gemini’s accuracy on English datasets by +6% and matches Gemini’s performance on Hindi datasets. Adopting a bilingual approach that com- bines English and Hindi samples achieves results comparable to individual language models, demonstrating the capability to learn mathematical reasoning in both languages. This re- search highlights the potential for improving mathematical reasoning in open-source LLMs

MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL

Author: Askari Arian
Poelitz Christian
Tang Xinye
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Self-correction in text-to-SQL is the process of prompting large language model (LLM) to revise its previously incorrectly generated SQL, and commonly relies on manually crafted self-correction guidelines by human experts that are not only labor-intensive to produce but also limited by the human ability in identifying all potential error patterns in LLM responses. We introduce MAGIC, a novel multi-agent method that automates the creation of the self-correction guideline. MAGIC uses three specialized agents: a manager, a correction, and a feedback agent. These agents collaborate on the failures of an LLM-based method on the training set to iteratively generate and refine a self-correction guideline tailored to LLM mistakes, mirroring human processes but without human involvement. Our extensive experiments show that MAGIC's guideline outperforms expert human's created ones. We empirically find out that the guideline produced by MAGIC enhances the interpretability of the corrections made, providing insights in analyzing the reason behind the failures and successes of LLMs in self-correction

Sentence-level Aggregation of Lexical Metrics Correlates Stronger with Human Judgements than Corpus-level Aggregation

Author: Cavalin Paulo
Domingues Pedro H.
Pinhanez Claudio
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

In this paper we show that corpus-level aggregation hinders considerably the capability of lexical metrics to accurately evaluate machine translation (MT) systems. With empirical experiments we demonstrate that averaging individual segment-level scores can make metrics such as BLEU and chrF correlate much stronger with human judgements and make them behave considerably more similar to neural metrics such as COMET and BLEURT. We show that this difference exists because corpus- and segment-level aggregation differs considerably owing to the classical average of ratio versus ratio of averages Mathematical problem. Moreover, as we also show, such difference affects considerably the statistical robustness of corpus-level aggregation. Considering that neural metrics currently only cover a small set of sufficiently-resourced languages, the results in this paper can help make the evaluation of MT systems for low-resource languages more trustworthy

Security Attacks on LLM-based Code Completion Tools

Author: Cheng Wen
Sun Ke
Zhang Xinyu
Wang Wei
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. This paper exploits these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a 46.3% success rate on Amazon Q. Furthermore, We successfully extracted sensitive user data from GitHub Copilot, including 54 real email addresses and 314 physical addresses associated with GitHub usernames. Our study also demonstrates that these code-based attack methods are effective against general-purpose LLMs, highlighting a broader security misalignment in the handling of code by modern LLMs. These findings underscore critical security challenges associated with LCCTs and suggest essential directions for strengthening their security frameworks

Multi-Turn Jailbreaking Large Language Models via Attention Shifting

Author: Du Xiaohu
Mo Fan
Wen Ming
Gu Tu
Zheng Huadi
Jin Hai
Shi Jie
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Large Language Models (LLMs) have achieved significant performance in various natural language processing tasks but also pose safety and ethical threats, thus requiring red teaming and alignment processes to bolster their safety. To effectively exploit these aligned LLMs, recent studies have introduced jailbreak attacks based on multi-turn dialogues. These attacks aim to prompt LLMs to generate harmful or biased content by guiding them through contextual content. However, the underlying reasons for the effectiveness of multi-turn jailbreaks remain unclear. Existing attacks often focus on optimizing queries and escalating toxicity to construct dialogues, lacking a thorough analysis of the inherent vulnerabilities of LLMs. In this paper, we first conduct an in-depth analysis of the differences between single-turn and multi-turn jailbreaks and find that successful multi-turn jailbreaks can effectively disperse the attention of LLMs on keywords associated with harmful behaviors, especially in historical responses. Based on this, we propose ASJA, a new multi-turn jailbreak approach by shifting the attention of LLMs, specifically by iteratively fabricating the dialogue history through a genetic algorithm to induce LLMs to generate harmful content. Extensive experiments on three LLMs and two datasets show that our approach surpasses existing approaches in jailbreak effectiveness, the stealth of jailbreak prompts, and attack efficiency. Our work emphasizes the importance of enhancing the robustness of LLMs' attention mechanism in multi-turn dialogue scenarios for a better defense strategy

Investigating the Security Threat Arising from “Yes-No” Implicit Bias in Large Language Models

Author: Du Yanrui
Zhao Sendong
Ma Ming
Chen Yuhan
Qin Bing
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Large Language Models (LLMs) have gained significant attention for their exceptional performance across various domains. Despite their advancements, concerns persist regarding their implicit bias, which often leads to negative social impacts. Therefore, it is essential to identify the implicit bias in LLMs and investigate the potential threat posed by it. Our study focused on a specific type of implicit bias, termed the ''Yes-No'' implicit bias, which refers to LLMs' inherent tendency to favor ''Yes'' or ''No'' responses to a single instruction. By comparing the probability of LLMs generating a series of ''Yes'' versus ''No'' responses, we observed different inherent response tendencies exhibited by LLMs when faced with different instructions. To further investigate the impact of such bias, we developed an attack method called Implicit Bias In-Context Manipulation, attempting to manipulate LLMs' behavior. Specifically, we explored whether the ''Yes'' implicit bias could manipulate ''No'' responses into ''Yes'' in LLMs' responses to malicious instructions, leading to harmful outputs. Our findings revealed that the ''Yes'' implicit bias brings a significant security threat, comparable to that of carefully designed attack methods. Moreover, we offered a comprehensive analysis from multiple perspectives to deepen the understanding of this security threat, emphasizing the need for ongoing improvement in LLMs' security

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

Author: Guo Wenxiang
Zhang Yu
Pan Changhao
Huang Rongjie
Tang Li
Li Ruiqi
Hong Zhiqing
Wang Yongqi
Zhao Zhou
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Singing voice synthesis has made remarkable progress in generating natural and high-quality voices. However, existing methods rarely provide precise control over vocal techniques such as intensity, mixed voice, falsetto, bubble, and breathy tones, thus limiting the expressive potential of synthetic voices. We introduce TechSinger, an advanced system for controllable singing voice synthesis that supports five languages and seven vocal techniques. TechSinger leverages a flow-matching-based generative model to produce singing voices with enhanced expressive control over various techniques. To enhance the diversity of training data, we develop a technique detection model that automatically annotates datasets with phoneme-level technique labels. Additionally, our prompt-based technique prediction model enables users to specify desired vocal attributes through natural language, offering fine-grained control over the synthesized singing. Experimental results demonstrate that TechSinger significantly enhances the expressiveness and realism of synthetic singing voices, outperforming existing methods in terms of audio quality and technique-specific control

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Author: Huang Weiyu
Hu Yuezhou
Jian Guohao
Zhu Jun
Chen Jianfei
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have attempted to compress LLMs using one-shot pruning methods. However, these methods often suffer from considerable performance degradation on complex language understanding tasks, raising concerns about the feasibility of pruning in LLMs. To address this issue, we propose Adaptive Sparse Trainer (AST), a novel and efficient retraining framework tailored for semi-structured sparse models. AST enables models to learn optimal masks during the weight update process without incurring additional computational overhead. Furthermore, we demonstrate that incorporating knowledge distillation significantly improves retraining efficiency and enhances model performance under fixed computational constraints. Additionally, a supplementary set of well-initialized parameters is integrated to further augment the model's efficacy. AST achieves state-of-the-art performance with minimal training cost. When applied to the LLaMA2-7B model, AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively, utilizing less than 0.4% of the pretraining tokens and GPU hours. Our work demonstrates the feasibility of deploying semi-structured sparse LLMs and offers a promising alternative for achieving highly compressed models when combined with existing quantization techniques

Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

Author: Liu Dunqiang
Huang Shujun
Li Wen
Shen Siqi
Wang Cheng
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

Language-based localization is a crucial task in robotics and computer vision, enabling robots to understand spatial positions through language. Recent methods rely on contrastive learning to establish correspondences between global features of texts and point clouds. However, the inherent ambiguity of textual descriptions makes it difficult to convey geometric information accurately, forcing alignment of them in the feature space may compromise the expressiveness of the point clouds. Unlike previous methods, this paper proposes using language as a filter to distinguish dissimilar locations. To this end, we propose a robust framework of multi-level negative contrastive learning for language-based localization, fully leveraging the descriptive power of language for spatial localization. Our method learns multiple mismatched factors by minimizing the similarity of different locations at different levels, including global-level, instance-level and relationlevel, respectively. Extensive experiments conducted on the KITTI360Pose benchmark demonstrate that our method outperforms better that the state-of-the-art methods. Specifically, we achieve a 56.3% improvement in Top-1 retrieval recall and a 45.9% improvement in 5m localization recall

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

Author: Liu Tao
Ma Ziyang
Chen Qi
Chen Feilong
Fan Shuai
Chen Xie
Yu Kai
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 11/04/2025
Field of study

We present VQTalker, a Vector Quantization-based framework for multilingual talking head generation that addresses the challenges of lip synchronization and natural motion across diverse languages. Our approach is grounded in the phonetic principle that human speech comprises a finite set of distinct sound units (phonemes) and corresponding visual articulations (visemes), which often share commonalities across languages. We introduce a facial motion tokenizer based on Group Residual Finite Scalar Quantization (GRFSQ), which creates a discretized representation of facial features. This method enables comprehensive capture of facial movements while improving generalization to multiple languages, even with limited training data. Building on this quantized representation, we implement a coarse-to-fine motion generation process that progressively refines facial animations. Extensive experiments demonstrate that VQTalker achieves state-of-the-art performance in both video-driven and speech-driven scenarios, particularly in multilingual settings. Notably, our method achieves high-quality results at a resolution of 512 × 512 pixels while maintaining a lower bitrate of approximately 11 kbps. Our work opens new possibilities for cross-lingual talking face generation

0

full texts

26,155

metadata records

Updated in last 30 days.

Association for the Advancement of Artificial Intelligence: AAAI Publications

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇