NII Repository (National Institute of Informatics)
Not a member yet
2035 research outputs found
Sort by
From Divergent LLM Predictions to Reliable Lung Cancer Staging with Ensemble Fusion: CYUT at the NTCIR-18 RadNLP Main Task
This study investigates the application of Large Language Models (LLMs) for automated lung cancer staging based on radiology reports, as part of the CYUT team’s participation in the NTCIR-18 RadNLP Main Task. Through data analysis, we observed a moderate correlation among the T, N, and M staging classes. Experimental results indicated that jointly prompting LLMs to predict all three classes simultaneously yields improved performance. Additionally, standardizing measurement units to millimeters, rather than centimeters, proved to be a more effective strategy. Based on these findings, we refined our prompting methodology and applied it to both LLMs and reasoning-augmented models, including OpenAI’s O-series and DeepSeek-R1. These reasoning-models, enhanced through post-training with Chain-of-Thought (CoT) reasoning, demonstrated superior staging accuracy. As LLMs are generative models, their outputs may vary across different runs, introducing inconsistency in predictions. To mitigate this variability, we adopted an ensemble learning strategy aimed at consolidating divergent LLM outputs into a more stable and reliable lung cancer staging system. Experimental results demonstrate that ensemble methods consistently outperform individual models, enhancing both the robustness and reliability of staging from radiology reports. Our approach achieved second place in the NTCIR-18 RadNLP Main Task (English), underscoring the effectiveness of LLM-based ensemble techniques for TNM classification. The implementation is available at github: anson70242/NTCIR-18-RadNLP-CYUT.conference pape
NLI24 at the NTCIR-18 RadNLP
The management of lung cancer heavily relies on precise staging, which is traditionally derived from comprehensive radiology reports generated through imaging techniques like CT and MRI. However, these reports often lack explicit staging details, posing challenges for healthcare professionals who must manually extract relevant information. To address this issue, we propose an automated solution as part of our submission to the RadNLP (Natural Language Processing for Radiology) shared task at the NTCIR-18 international conference. Our approach utilizes tailored Natural Language Processing (NLP) techniques to enhance the processing of radiology reports. In this paper, we describe our methodology for the RadNLP subtask, which involves document segmentation to identify eight key classes within radiology reports, and the primary task, which focuses on the automated TNM staging of lung cancer. For the subtask, we employed an ensemble of three fine-tuned, hyperparameter-optimized BERT-based medical language models, which yielded an overall micro F2 score of 0.9433, securing the top rank in the competition. For the main task, we developed individual pipelines for T, N, and M staging, consisting of BERT-based models and LLMs in a multistage processing framework, resulting in a joint accuracy of 0.5679 and an overall 4th place finish in the competition. Our solution not only streamlines the extraction of critical information but also aims to improve the accuracy and efficiency of cancer staging, ultimately supporting clinical decision-making and contributing to better patient outcomesconference pape
ORAD at NTCIR-18 RadNLP 2024 Shared Task
Here, we report our approach to the NTCIR-18 RadNLP2024 Shared Task (Japanese Track, Main Task). In this study, we developed a system to determine the TNM classification from lung cancer using Japanese radiology reports. Specifically, we provided Google DeepMind’s Gemini 2.0 Flash Experimental (gemini-2.0-flash-exp) with a prompt that combines Chain-of-Thought (CoT) and Many-Shot In-Context Learning (ICL), enabling automatic prediction of the T, N, and M factors for each case. Besides accuracy, interpretability is crucial in the medical domain; thus, having the model output the rationale for its TNM classification ensures a degree of transparency. Moreover, by including numerous examples of CoT-based reasoning—written by a radiologist with 5 years of dedicated experience in diagnostic radiology—to explain how the TNM classification is derived, we achieved improved inference accuracy. Furthermore, to address privacy concerns and the need for local inference without network connectivity in clinical settings, we performed Supervised Fine-Tuning (SFT) using Gemma2-9b-it, a comparatively lightweight open-source model. By providing the model with CoT-based reasoning steps leading to TNM classification as training data, we observed improved inference accuracy. These findings demonstrate that additional data and prompt strategies to support large language model (LLM)-based inference can be highly effective in automating TNM classification while also indicating the feasibility of realizing interpretability in LLM-based medical applications.conference pape
RAD-PHI3 at the NTCIR-18 HIDDEN-RAD: Hidden Causality Inclusion in Radiology Reports with Multimodal Small Language Models
This paper presents the participation of the Microsoft Research RADPHI3 team in the Hidden-RAD Challenge: Hidden Causality Inclusion in Radiology Reports. The task aims to recover hidden causality from radiology reports, optionally accompanied by their corresponding frontal chest X-rays (CXRs). We fine-tune small language models, specifically Rad-Phi-3.5 Vision-CXR, to recover causality analysis in both language-only and multi-modal settings, given radiology reports and radiology images as inputs. We also include baselines of various models in the general domain, including models specifically tuned for reasoning tasks such as GPT-4o, LLaMA 3.3, Phi4, DeepSeek, OpenAI o1, OpenAI o1-mini, and OpenAI o3-mini3. Through these experiments, we evaluated the effectiveness of general-domain, reasoning-specialized, and fine-tuned domain-specific small language models in generating causal explanations given radiology reports and images optionally as inputs.conference pape
Optimizing Causality-Based Radiology Reporting with Retrieval-Augmented and Structured Reasoning Approaches for the NTCIR-18 HIDDEN-RAD Task
The nash team participated in the NTCIR-18 Hidden-RAD Task, focusing on generating causality-based diagnostic inferences from radiology reports. In Subtask 1, we applied a cost-efficient API-driven inference pipeline to recover hidden causalities within MIMIC-CXR reports. Our pipeline integrates few-shot in-context learning, retrieval-enhanced prompting, and strict candidate selection using an evaluation checklist. By leveraging retrieved similar cases to enrich the prompt dynamically, this approach achieved the highest ranking (1st place) in the official evaluation. In Subtask 2, we explored structured diagnostic reasoning using PRISMA-Guided Causal Explanation, applying prompt-based systematic reasoning to enhance interpretability. Our method, leveraging structured PRISMA flow with large language models, secured 2nd place in the official evaluation. Additionally, we investigated an alternative approach that combined fine-tuning and domain-specific prompting to improve model adaptability. While this method was not included in the final ranking, it demonstrated potential in enhancing domain-specific model interpretability. These findings contribute to the advancement of explainable AI (XAI) in radiology, bridging the gap between automated diagnosis and human expert decision-making.conference pape
AKBL at NTCIR-18 U4 TableRetrieval and TableQA
In this paper, we propose a three-stage method for the U4 TableQA task. The method first analyzes and segments the target table into header and data cell sections using a machine learning classifier. Then, it generates natural language descriptions for each data cell using sentence templates based on the table structure. Finally, it retrieves relevant sentences matching the input question from the generated sentence set to form the TableQA result. This approach is also extended to the Table Retrieval task. Evaluation experiments showed that the Table Retrieval task achieved an accuracy of 0.3569, whereas for the TableQA task, the accuracy of cell_id prediction was 0.7797, and the value prediction was 0.7168.conference pape
AIREV at the NTCIR-18 U4 Task
The AIREV team participated in the NTCIR-18 U4 shared task, which comprises two subtasks, Table Retrieval (TR) and Table Question Answering (TQA), designed to evaluate and advance system capabilities for handling real-world financial documents. This paper reports our approach to solving two subtasks and discusses the experimental results. Our proposed approaches are primarily based on fine-tuning pre-trained LLMs on specific downstream tasks involving several key components, converting tabular form data to natural language representations, well-designed prompts, Bert-based re-ranking, and LLM-based retrieval. Our proposed approaches are placed in the second position in the leaderboard on both the TR and TQA subtasks, based on the performance compared to the other participant teams, demonstrating the effectiveness of our proposed method.conference pape
SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」 オープンアクセス義務化後の大学図書館の姿としての『2030デジタル・ライブラリー』 発表資料
SPARC Japan セミナー2024「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」
開催場所:オンライン開催
日時:2025年1月30日(木)13:00~17:00conference presentatio
第30回大学図書館と国立情報学研究所との連携・協力推進会議議事要旨
会議名:第30回大学図書館と国立情報学研究所との連携・協力推進会議
開催場所:オンライン
日時:2025年7月15日(火)13:30~15:30conference outpu