NII Repository (National Institute of Informatics)
Not a member yet
2035 research outputs found
Sort by
Evaluation Results of UTUtLB25 Team in NTCIR-18 U4 Task of Table Question Answering of Securities Reports
The goal of this paper is to develop a system for participating in the information extraction task from tables in securities reports (NTCIR- 18 U4 Task). The NTCIR-18 U4 Task consists of two distinct tasks: (1) retrieving the table that contains the relevant data. (2) extracting the desired data from the table to address the question. For the first task, we will utilize a pre-trained model that has demonstrated strong performance in table retrieval, and we will fine-tune the model to enhance its effectiveness for this specific task. In the second task, We will employ the latest Large Language Models (LLMs), which have shown excellent results across a variety of Natural Language Processing tasks. This approach is expected to achieve state-ofthe- art performance, surpassing existing pre-trained BERT-based models.conference pape
Structured Evaluation of Legal Reasoning in LLMs: Chain-of-Thought Prompting and Human Scoring for Retrieval Robustness
This study investigates the legal reasoning abilities of
Large Language Models (LLMs) in Taiwan’s Status Law (family
and inheritance law) and evaluates the effects of
Chain-of-Thought (CoT) prompting on answer quality. Six
essay questions from past judicial and graduate law exams
were decomposed into 68 sub-questions targeting issue
spotting, statutory application, legal reasoning, and
property calculation. Four LLMs (ChatGPT-4o, Gemini,
Copilot, and Grok3) were evaluated using a two-stage
framework: decomposed sub-question accuracy (Stage 1) and
full-length essay response performance with and without CoT
prompting (Stage 2), with human scoring conducted by a law
professor and a student.
Results show that CoT prompting consistently improves legal
reasoning quality across models, notably enhancing issue
coverage, statutory citation accuracy, and reasoning
structure. Gemini achieved the most significant accuracy
gains (from 83.2% to 94.5%, p < 0.05) and was selected for
detailed qualitative analysis. Beyond model-specific
findings, this study contributes to retrieval evaluation
research by addressing statistical consistency challenges
in human scoring, proposing a diagnostic evaluation method
adaptable for multilingual and multimedia legal corpora,
and suggesting extensions for evaluating enterprise-level
legal information systems. These findings underscore the
value of structured prompting strategies in supporting more
interpretable, transferable, and scalable legal AI
evaluation frameworks.conference pape
研究者と大学図書館をつなぐ図書「発見」環境の実現へ
会議名:学術情報基盤オープンフォーラム2025
開催場所:CiNii Researchトラック「これからどうなる?CiNii Research」
日時:2025年6月16日(月)~6月18日(水)conference outpu
SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」 学術情報流通の次の10年の見取り図 ドキュメント
SPARC Japan セミナー2024「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」
開催場所:オンライン開催
日時:2025年1月30日(木)13:00~17:00conference presentatio
ダイ 42 カイ コレカラ ノ ガクジュツ ジョウホウ システム コウチク ケントウ イインカイ ハイフシリョウ
会議名:第42回 これからの学術情報システム構築検討委員会
開催場所:オンライン
日時:2025年1月24日(水)15:00~17:00conference outpu
Overview of NTCIR-18
The NTCIR project, organized by the National Institute of Informatics (NII) in Japan, has been a key platform for information retrieval (IR) and natural language processing (NLP) research since 1997. NTCIR-18, running from January 2024 to June 2025, features seven core tasks and three pilot tasks covering LLM evaluation, advanced IR, domain-specific NLP, and personal data management. A total of 113 teams worldwide participated, registering 178 times across tasks. This paper provides an overview of NTCIR-18, highlighting its objectives, methodologies, and key findings, along with future directions.conference pape
TMUNLPG1 at the NTCIR-18 FinArg-2 Task
The TMUNLPG1 team participated in the FinArg-2 Task of NTCIR-18, focusing on the Detection of Argument Temporal References and Assessment of the Claim's Validity Period in the finance domain using Earning Conference Call and Social Media datasets. The team ranked 6th and 2nd in these subtasks, respectively. This paper presents the team's methodologies, results, and conclusions. For Earnings Conference Call (ECC) Argument Temporal References, we utilized a combination of feature engineering, ensemble strategy, and data augmentation to achieve a Micro F1 score of 0.6905. In Social Media Assessment of the Claim's Validity Period, we developed an enhanced approach combining domain-specific transformer architectures with statistical feature engineering. By integrating FinBERT with Log-Likelihood Ratio (LLR) and Pointwise Mutual Information (PMI) features, we achieved a Micro F1 score of 0.742 on the unified dataset and demonstrated robust performance on the test set. The methodology incorporates weighted pooling strategies and adaptive learning rate optimization to improve temporal validity prediction accuracy. Our results highlight the effectiveness of combining domain-specific language models with traditional statistical approaches in financial text analysis, contributing to advancements in temporal natural language processing for the financial domain.conference pape
SCaLAR IT at the NTCIR-18 FinArg-2: Temporal Inference of Financial Arguments
The SCaLAR IT team participated in the Detection of Argument Temporal References subtask of the NTCIR-18 FinArg-2 Task. This paper presents our approach to solving the classification of financial arguments based on temporal references. We explored multiple ar- chitectures combining a BERT-based model with knowledge-based and temporal feature extraction techniques. To improve the perfor- mance,integrated BERT with TF-IDF based temporal features were extracted using STANZA and BERT embeddings to enhance tempo- ral reference detection. Our first model BERTForSequenceClassifier achieves the Micro F1 score of 70.24% and Macro F1 score of 67.85% outperforming most approaches of other teams. However incorpo- rating additional temporal features improved the Macro F1 score, indicating better performance across all classes. We analyze the effectiveness of different feature representations in our research.conference pape
LSAT Focus: EAGLE’s Embedded Entities Highlighting Technique for NTCIR-18 Lifelog-6
This paper presents our work in the Lifelog Semantic Access Task (LSAT) at NTCIR-18, focusing on automatic searching methods for finding distinct life moments. Our experiments explore and compare different retrieval strategies, including keyword matching-based search combined with embedding extraction, vector embedding-based semantic search using a multimodal model, and hybrid methods that take advantage of both approaches. Our proposed method improved retrieval accuracy by directing the model's attention to key query terms while prioritizing semantic relevance and the presence of requested entities in the retrieved moments. Experimental results demonstrated that the best-performing method relies on embeddings incorporating extended descriptions and highlighted keywords. Conversely, the hybrid methods in our experiments have less effective results, likely due to limitations in the keyword-matching search algorithm. This work's findings underscore the richer descriptive entities within queries to enhance the retrieval of life moments, ensuring a focus on core semantic and visual elements.conference pape
NTCIR-18 RadNLP 2024 Overview: Dataset and Solutions for Automated Lung Cancer Staging
Radiology reports play a vital role in clinical workflows, serving as a primary means for radiologists to communicate imaging findings to physicians. However, the increasing number of imaging studies has made it challenging to produce and interpret comprehensive reports in a timely manner. Natural language processing (NLP) has shown potential to alleviate this burden, yet most existing studies are limited to English, while clinical reports are often written in local languages. To address this gap, we have developed and released Japanese medical text datasets through a series of shared tasks. Our recent efforts, including NTCIR-16 Real-MedNLP and NTCIR-17 RR-TNM, focused on automating lung cancer staging from radiology reports using the TNM classification system. This task is clinically significant, yet challenging due to the implicit nature of staging information and the complexity of TNM criteria. In this paper, we introduce the NTCIR-18 RadNLP 2024 shared task, which extends the previous task with finer-grained classification, a larger and bilingual corpus, and new sentence-level subtasks. We present the dataset, participating systems, and evaluation results, aiming to provide practical insights into building NLP systems for cancer staging support.conference pape