NII Repository (National Institute of Informatics)
Not a member yet
2035 research outputs found
Sort by
SCUNLP-1 at the NTCIR-18 FinArg-2 Task: Collaborative Large Language Models for Temporal Classification
The SCU-1 team participated in the "Detection of Argument Temporal References in Earnings Conference Calls" subtask of the NTCIR-18 FinArg-2 task. This study reports our approach to solving the problem and discusses the official results. We analyze the impact of step-by-step reasoning, model collaboration, and prompt design on the classification performance of large language models (LLMs). Through a series of experiments, we found that providing detailed explanations and incorporating previous model predictions significantly improved classification accuracy. Additionally, we compared different LLM discussion mechanisms and prompt design strategies, revealing that allowing models to reference each other and reason based on prior outputs effectively enhances decision-making quality. Run 3, which included complete reasoning steps and prior model outputs, achieved the best performance, highlighting the advantages of cross-model reference and optimized prompt design. These findings offer new directions for improving LLM-based classification tasks.conference pape
UTSolve at the NTCIR-18 MedNLP-CHAT: Leveraging BioBERT for Medical Text Classification
Our team, UTSolve, participated in the Medical Natural Language Processing for AI Chat (MedNLP-CHAT) task~\footnote{https://sociocom.naist.jp/mednlp-chat/} at NTCIR-18. The task involved classifying various medical texts into medical, ethical, and legal risks. In this report, we utilized BioBERT, a pre-trained biomedical language model that was trained on a large amount of biological text data to predict the risk level of medical texts. We also evaluated the medical and clinical language models MedBERT and ClinicalBERT. Based on prediction performance, BioBERT achieved the best classification results, with a weighted F1 score of 0.7812 for medical risk, 0.8629 for ethical risk, and 0.7288 for legal risk.conference pape
Domain Adaptation with Medical Vocabulary-Aware Tokenizer for Radiology Report Analysis in RadNLP at KAIYO03
Recent advances in language models (LMs) have significantly improved the handling of complex medical narratives compared to classical methods. However, one major obstacle to the practical usage of these LMs in the medical domain is that the models lack training on medical knowledge. In particular, standard tokenizers trained on open-domain corpora fail to accurately capture domain-specific terminologies, abbreviations, and writing styles in radiology reports or clinical notes. To address this issue, we propose a two-step domain-transfer method that updates both the tokenizer vocabulary and the LM representations. First, we replace low-frequency tokens in the original general-domain vocabulary with high-frequency bi- and tri-grams extracted from medical text, ensuring that domain-relevant tokens are learned. Second, we continually pre-train the LM on the medical corpus using the masked language modeling to more closely align the model parameters to the domain-specific language parameters. We evaluated the effectiveness of this approach in the RadNLP 2024 shared task on lung cancer staging from radiology reports, covering both English and Japanese. Experimental results indicate that our method improves performance on this specialized task, suggesting that customizing tokenizers and re-training language models can substantially mitigate the domain gap. In the future, we address standardizing radiology report formats to facilitate more robust and accurate automated analysis.conference pape
NITKC at the NTCIR-18 RadNLP shared task: Using Graph-RAG in a lung cancer staging method with Natural Language Processing for Radiology
The NITKC team participated in the RadNLP Shared task of TNM classification from lung cancer radiology reports written in English, using an LLM-based approach. LLM accuracy varies depending on training methods and the number of parameters. We aimed to solve this task using open-source LLMs with fewer parameters than closed-source, proprietary LLMs and made improvements accordingly. Open-source LLMs have less prior knowledge than closed-source LLMs, putting them at a disadvantage for TNM classification. To address this, we used Graph-RAG to improve accuracy and address issues by representing domain knowledge for unfamiliar tasks as a graph and incorporating it as knowledge into the LLM. This method uses a graph database to represent domain knowledge for TNM classification in a graph structure. It dynamically incorporates the graph information into LLM prompts, compensating for the knowledge gaps in open-source LLMs and enabling more accurate inference. Additionally, to enhance performance, we trained BioBERT and MedBERT on a dataset labeled with lung cancer progression stages and utilized these inference results concurrently. As a result, we achieved a joint accuracy of 0.2963 in the TNM classification task. This demonstrates that our approach effectively mitigates the limitations of open-source LLMs in TNM classification.conference pape
ATILF at NTCIR-18 RadNLP 2024 Shared Task: With less radiology reports, comes less performance
We present our results on the main task and subtask of the NTCIR-18 RadNLP 2024 shared task on the English language. We tested to what extent Large Language Models (LLMs) and Pretrained Language Models (PLMs) can identify and classify tumor types and subtypes. Our results for the main task showed that LLMs have difficulties in understanding different subtypes of tumors. For the tumor sentence segment classification subtask, we obtained competitive overall score with pretrained language models with an overall score of 0.83 for micro F2.0 metric. Our results showed that in low amount of data setting, we have a better chance with clinical PLMs in comparison to general and domain specific LLMs. Providing additional information such definitions in case clinical staging classification can help LLMs achieve better scores on fine-grained classification.conference pape
TUSNLP at the NTCIR-18 RadNLP Task: Explainable Classification Approach by Domain Knowledge-Based Bag-of-Words
We developed highly interpretable classification models of lung cancer stage using Bag-of-Words representations that consist of predefined key terms based on domain knowledge. These models had high medical validity and provided new clinical insights. This study demonstrates the effectiveness of domain knowledge in improving model accuracy and the usefulness of model interpretability in the medical field.conference pape
KASYS at the NTCIR-18 SUSHI Task
This paper describes the KASYS team's participation in the NTCIR-18 SUSHI Task by presenting a multi-level metadata aggregation and retrieval approach for Subtask A, which focuses on retrieving undigitized historical materials with sparse item-level metadata. Our system leverages the hierarchical organization of the data---comprising Box, Folder, and Item levels---by aggregating metadata from lower to higher levels and applying two search strategies (``Merge'' and ``Each''). We evaluate traditional BM25 alongside dense retrieval models (E5 and ColBERT) without fine-tuning, and hyperparameter optimization using Optuna is employed to determine the optimal weight for each level. Although our multi-level score aggregation strategy was designed to exploit the hierarchical structure of the data, it did not yield a significant performance improvement over a simpler BM25 baseline. Future work will explore improved preprocessing of noisy metadata, hybrid retrieval methods combining BM25 with dense re-ranking, and model fine-tuning to further enhance performance in searching undigitized archival collections.conference pape
Biting into SUSHI: The University of Maryland at NTCIR-18
The University of Maryland participated in both subtasks of the SUSHI Pilot Task. This paper describes the design of the systems used for each task, and it presents some preliminary analysis of the available results. The generation of data that has been shared with other participating teams is also described.conference pape
LLMs and offline test collections: a dangerous distraction or a vital new tool?
conference pape
CiNiiの次の5年に向けて
会議名:学術情報基盤オープンフォーラム2025
開催場所:CiNii Researchトラック「これからどうなる?CiNii Research」
日時:2025年6月16日(月)~6月18日(水)conference outpu