NII Repository (National Institute of Informatics)

Not a member yet

2035 research outputs found

Sort by

Embedding Tables in Text Context: NTCIR - 18 U4 Tasks

Author: Hiroyuki Higa
Maeyama Yuuki
Kazuhiro Takeuchi
Publication venue: NII Institutional Repository
Publication date: 06/06/2025
Field of study

Financial reports, such as securities reports, contain various figures and tables that play a crucial role in conveying structured information. In this study, we focus on the analysis of tables by integrating both textual and tabular data. We present a method that leverages natural language processing (NLP) techniques to assess the correctness of extracted information.conference pape

令和7年度第1回研究データ基盤運営委員会配布資料

Author
Publication venue: 研究データ基盤運営委員会
Publication date: 2025
Field of study

conference outpu

ダイ　43　カイ　コレカラ　ノ　ガクジュツ　ジョウホウ　システム　コウチク　ケントウ　イインカイ　ギジヨウシ

Author
Publication venue: これからの学術情報システム構築検討委員会
Publication date: 25/06/2025
Field of study

会議名：第43回これからの学術情報システム構築検討委員会開催場所：オンライン日時：2025年6月25日（水）13:00～15:00conference outpu

SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの：来るべき世界に向けて」オープンな協働型マッピングの展開とデータ活用　ドキュメント

Author: 瀬戸寿一
Publication venue: 国立情報学研究所
Publication date: 30/01/2025
Field of study

SPARC Japan セミナー2024「オープンアクセス義務化の先にあるもの：来るべき世界に向けて」開催場所：オンライン開催日時：2025年1月30日（木）13:00～17:00conference presentatio

SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの：来るべき世界に向けて」ライフサイエンスにおけるオープンアクセスの歴史　発表資料

Author: 川島秀一
Publication venue: 国立情報学研究所
Publication date: 30/01/2025
Field of study

第30回大学図書館と国立情報学研究所との連携・協力推進会議配布資料

Author
Publication venue: 大学図書館と国立情報学研究所との連携・協力推進会議
Publication date: 15/07/2025
Field of study

会議名：第30回大学図書館と国立情報学研究所との連携・協力推進会議開催場所：オンライン日時：2025年7月15日（火）13:30～15:30conference outpu

UCLWI at the NTCIR-18 AEOLLM Task: A Low-Cost Comparison of RAGs

Author: Xiao Fu
Navdeep Singh Bedi
Noriko Kando
Fabio Crestani
Aldo Lipani
Publication venue: NII Institutional Repository
Publication date: 06/06/2025
Field of study

We propose an efficient evaluation pipeline for Retrieval-Augmented Generation (RAG) systems tailored for low-resource settings. Our method uses ensemble similarity measures combined with a logistic regression classifier to assess answer quality from multiple system outputs using only the available queries and replies. Experiments across diverse tasks demonstrate competitive accuracy and a reasonable correlation with ground truth rankings, establishing our approach as a reliable metric.conference pape

ISLab at the NTCIR-18 AEOLLM: An Evaluator for Machine-Generated Text based on Data Augmentation and ORPO

Author: Chia-Hui Lin
Cen-Chieh Chen
Tao-Hsing Chang
Fu-Yuan Hsu
Publication venue: NII Institutional Repository
Publication date: 06/06/2025
Field of study

In recent years, large language models (LLMs) have been widely applied to various natural language processing (NLP) tasks, demonstrating exceptional performance. To evaluate the output quality of these LLMs, numerous studies utilize one LLM as an evaluator to assess the quality of outputs from other LLMs, showing promising results on public benchmarks. However, the performance of LLMs as evaluators on many unpublished benchmarks still needs improvement. To achieve better evaluation performance, some studies have attempted to fine-tune evaluators based on large amounts of data, incurring significant manual costs and posing substantial limitations in practical applications. Therefore, this paper leverages data augmentation to increase the volume of training data and employs the odds ratio preference optimization (ORPO) algorithm for reinforcement learning to optimize the evaluator. This study uses the dataset provided by NTCIR-18’s Automatic Evaluation of LLMs (AEOLLM) task for training and testing. The proposed method achieves an accuracy of 0.7658 on the summary generation subtask of AEOLLM, the highest among all compared models. Additionally, it yields the second-highest performance in both Kendall’s tau and Spearman correlation coefficient on the summary generation and text expansion subtasks among all compared models.conference pape

THUIR at the NTCIR-18 FairWeb-2 Task

Author: Huixue Su
Haitao Li
Yiteng Tu
Qingyao Ai
Yiqun Liu
Publication venue: NII Institutional Repository
Publication date: 06/06/2025
Field of study

The fairness of search systems remains a critical challenge in information retrieval. Building upon our previous work in FairWeb‑1, this paper presents the THUIR team’s approach in the NTCIR‑18 FairWeb‑2 Task. Specifically, we developed a simple yet effective retrieval pipeline that integrates multiple neural rerankers with results aggregated via Reciprocal Rank Fusion to generate balanced search rankings across various entity types. Additionally, we submitted a revived run that combines a PM2-based result diversification algorithm with dense retrieval scores. Our experimental results yield competitive performance on multiple evaluation metrics, demonstrating that enhancements in retrieval relevance inherently promote balanced group fairness. With the right combination of techniques, it is possible to achieve a synergistic reinforcement between relevance and fairness.conference pape

SCUNLP-2 at the NTCIR-18 FigArg-2 Task: Apply Repeat-Error-Correction Learning on Text Classification

Author: Tong-Ru Wu
Jheng-Long Wu
Publication venue: NII Institutional Repository
Publication date: 06/06/2025
Field of study

Large Language Models (LLMs) have shown promising capabilities for zero-shot text classification, yet they often do not outperform fine-tuned traditional models like BERT when trained on sufficient labeled data. However, acquiring large-scale human-labeled datasets can be challenging, particularly in specialized domains. To address this gap, we propose Repeat-Error-Correction Learning, a framework that iteratively identifies and rewrites misclassified samples to augment the training set. First, we train a base BERT model using available text–label pairs. Next, the trained model infers labels on the same dataset, and we collect the misclassified samples. An LLM, such as GPT-4o-mini, then rewrites these erroneous texts while preserving their original labels. The rewritten texts are reintroduced into the training set, and the model is fine-tuned on this expanded corpus. By iteratively refining the training data through error correction and text rewriting, the proposed method aims to achieve robust classification performance despite limited initial annotations. Our results indicate that fine-tuning the base model by adding rewritten misclassified text achieved the highest validation set Micro-F1 score (77.33%). These findings contribute to a deeper understanding of a cost-friendly and efficient way to generate data for augmenting text classification models.conference pape

2,022

full texts

2,035

metadata records

Updated in last 30 days.

NII Repository (National Institute of Informatics)

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇