NII Repository (National Institute of Informatics)
Not a member yet
2035 research outputs found
Sort by
SPARC Japan NewsLetter NO.48
■ SPARC Japan Activity Reports
Support for arXiv.org[p.1]
Support for CLOCKSS[p.2]
Support for the SCOAP3[p.2]
Contributions allocated for SCOAP3 Phase 4 (2025–2027)[p.3]
■ SPARC Japan Seminar Report
Outline[p.4]
Presentation Abstracts and Speakers[p.4]
Panel Discussion[p.10]
Attendee Feedback[p.12]
Afterword[p.12]articl
リレーショナルDBにおける複数のテーブル
研修名:2025年度大学図書館員のためのIT総合研修
開催期間:2025年8月20日(水)~8月22日(金)
主催:国立情報学研究所conference presentatio
ダイ 44 カイ コレカラ ノ ガクジュツ ジョウホウ システム コウチク ケントウ イインカイ ハイフシリョウ
会議名:第44回 これからの学術情報システム構築検討委員会
開催場所:オンライン
日時:2025年10月30日(水)10:00~12:00conference outpu
Overview of the NTCIR-18 Automatic Evaluation of LLMs (AEOLLM) Task
In this paper, we provide an overview of the NTCIR-18 Automatic Evaluation of LLMs (AEOLLM) task. As large language models (LLMs) grow popular in both academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we propose the AEOLLM task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as dialogue generation, text expansion, summary generation and non-factoid question answering to comprehensively test different methods. This year, we received 48 runs from 4 teams in total. This paper will describe the background of the task, the data set, the evaluation measures and the evaluation results, respectively.conference pape
SCUNLP-3 at the NTCIR-18 FinArg-2 Task: Template-Based Prompting and Augmentation
Social media claims often have shifting validity that influences downstream tasks like misinformation detection, financial predictions, and domain-specific decisions. This study proposes a novel approach that merges original text with automatically generated template text to highlight temporal cues. By integrating this enriched data into the training process, the model more effectively gauges how long a claim remains reliable, even when its relevance rapidly evolves. This strategy addresses the challenge of ephemeral statements whose validity fluctuates as new information emerges. Experimental results underscore the method’s effectiveness, achieving a macro-F1 score of 78.10%. These findings highlight the importance of systematically assessing claim longevity, providing a pathway to more robust content analysis and better-informed decisions in ever-changing online environments.conference pape
vitrivr-engine at the NTCIR-18 Lifelog-6 Task
This paper discusses vitrivr's participation in the Lifelog Semantic Access subtask of the 6th edition of the NTCIR Lifelog. It is based on the system that participated in the 2024 Lifelog Search Challenge and only replaces the interactive query interface with an LLM-based query transformation method. All results are generated in one pass without any further re-processing or refinement.conference pape
NTCIR-18 MedNLP-CHAT Determining Medical, Ethical and Lega Risks in Patient-Doctor Conversations: Task Overview
This paper presents an overview of the Medical Natural Language Processing for AI Chat (MedNLP-CHAT) task, conducted as part of the shared task at NTCIR-18. Recently, medical chatbot services have emerged as a promising solution to address the shortage of medical and healthcare professionals. However, the potential risks associated with these chatbots remain insufficiently understood. Given this context, we designed the MedNLP-CHAT task to evaluate medical chatbots from multiple risk perspectives, including medical, legal, and ethical aspects. In this shared task, participants were required to analyze a given medical question along with the corresponding chatbot response and determine whether the response posed a potential medical, legal, or ethical risk (binary classification). Nine teams participated in this task applying different approaches, yielding valuable insights.conference pape
IMNTPU at NTCIR-18 MedNLP-CHAT Task: Evaluating Agentic AI for Multilingual Risk Assessment in Medical Chatbots
The IMNTPU team presents a multilingual evaluation of Agentic AI for chatbot risk classification in the NTCIR-18 MedNLP-CHAT task. Our framework integrates fine-tuned small models, optimized few-shot prompting with GPT-4o, and multi-agent aggregation via majority and trust-weighted voting. Results show that Agentic AI enhances decision consistency, especially in subjective tasks like ethical risk, but yields limited gains in structured domains such as medical and legal assessment. Language-specific outcomes reveal that annotation quality and linguistic complexity jointly affect model performance, with Japanese systems showing the most stability. Confidence analysis highlights a decoupling between model certainty and accuracy, underscoring the need for adaptive trust and calibration strategies. Building on these insights, we propose a Trust-Guided Agentic AI architecture featuring self-consistency filtering, dynamic trust updating, and Chain-of-Thought prompting to further improve reliability in safety-critical AI systems.conference pape
SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」 日本における研究力強化とオープンアクセス 発表資料
SPARC Japan セミナー2024「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」
開催場所:オンライン開催
日時:2025年1月30日(木)13:00~17:00conference presentatio