NII Repository (National Institute of Informatics)
Not a member yet
    2035 research outputs found

    Evaluation Results of UTUtLB25 Team in NTCIR-18 U4 Task of Table Question Answering of Securities Reports

    Full text link
    The goal of this paper is to develop a system for participating in the information extraction task from tables in securities reports (NTCIR- 18 U4 Task). The NTCIR-18 U4 Task consists of two distinct tasks: (1) retrieving the table that contains the relevant data. (2) extracting the desired data from the table to address the question. For the first task, we will utilize a pre-trained model that has demonstrated strong performance in table retrieval, and we will fine-tune the model to enhance its effectiveness for this specific task. In the second task, We will employ the latest Large Language Models (LLMs), which have shown excellent results across a variety of Natural Language Processing tasks. This approach is expected to achieve state-ofthe- art performance, surpassing existing pre-trained BERT-based models.conference pape

    Structured Evaluation of Legal Reasoning in LLMs: Chain-of-Thought Prompting and Human Scoring for Retrieval Robustness

    Full text link
    This study investigates the legal reasoning abilities of Large Language Models (LLMs) in Taiwan’s Status Law (family and inheritance law) and evaluates the effects of Chain-of-Thought (CoT) prompting on answer quality. Six essay questions from past judicial and graduate law exams were decomposed into 68 sub-questions targeting issue spotting, statutory application, legal reasoning, and property calculation. Four LLMs (ChatGPT-4o, Gemini, Copilot, and Grok3) were evaluated using a two-stage framework: decomposed sub-question accuracy (Stage 1) and full-length essay response performance with and without CoT prompting (Stage 2), with human scoring conducted by a law professor and a student. Results show that CoT prompting consistently improves legal reasoning quality across models, notably enhancing issue coverage, statutory citation accuracy, and reasoning structure. Gemini achieved the most significant accuracy gains (from 83.2% to 94.5%, p < 0.05) and was selected for detailed qualitative analysis. Beyond model-specific findings, this study contributes to retrieval evaluation research by addressing statistical consistency challenges in human scoring, proposing a diagnostic evaluation method adaptable for multilingual and multimedia legal corpora, and suggesting extensions for evaluating enterprise-level legal information systems. These findings underscore the value of structured prompting strategies in supporting more interpretable, transferable, and scalable legal AI evaluation frameworks.conference pape

    研究者と大学図書館をつなぐ図書「発見」環境の実現へ

    Full text link
    会議名:学術情報基盤オープンフォーラム2025 開催場所:CiNii Researchトラック「これからどうなる?CiNii Research」 日時:2025年6月16日(月)~6月18日(水)conference outpu

    SPARC Japan セミナー2024 「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」 学術情報流通の次の10年の見取り図 ドキュメント

    Full text link
    SPARC Japan セミナー2024「オープンアクセス義務化の先にあるもの:来るべき世界に向けて」 開催場所:オンライン開催 日時:2025年1月30日(木)13:00~17:00conference presentatio

    ダイ 42 カイ コレカラ ノ ガクジュツ ジョウホウ システム コウチク ケントウ イインカイ ハイフシリョウ

    Full text link
    会議名:第42回 これからの学術情報システム構築検討委員会 開催場所:オンライン 日時:2025年1月24日(水)15:00~17:00conference outpu

    Overview of NTCIR-18

    Full text link
    The NTCIR project, organized by the National Institute of Informatics (NII) in Japan, has been a key platform for information retrieval (IR) and natural language processing (NLP) research since 1997. NTCIR-18, running from January 2024 to June 2025, features seven core tasks and three pilot tasks covering LLM evaluation, advanced IR, domain-specific NLP, and personal data management. A total of 113 teams worldwide participated, registering 178 times across tasks. This paper provides an overview of NTCIR-18, highlighting its objectives, methodologies, and key findings, along with future directions.conference pape

    TMUNLPG1 at the NTCIR-18 FinArg-2 Task

    Full text link
    The TMUNLPG1 team participated in the FinArg-2 Task of NTCIR-18, focusing on the Detection of Argument Temporal References and Assessment of the Claim's Validity Period in the finance domain using Earning Conference Call and Social Media datasets. The team ranked 6th and 2nd in these subtasks, respectively. This paper presents the team's methodologies, results, and conclusions. For Earnings Conference Call (ECC) Argument Temporal References, we utilized a combination of feature engineering, ensemble strategy, and data augmentation to achieve a Micro F1 score of 0.6905. In Social Media Assessment of the Claim's Validity Period, we developed an enhanced approach combining domain-specific transformer architectures with statistical feature engineering. By integrating FinBERT with Log-Likelihood Ratio (LLR) and Pointwise Mutual Information (PMI) features, we achieved a Micro F1 score of 0.742 on the unified dataset and demonstrated robust performance on the test set. The methodology incorporates weighted pooling strategies and adaptive learning rate optimization to improve temporal validity prediction accuracy. Our results highlight the effectiveness of combining domain-specific language models with traditional statistical approaches in financial text analysis, contributing to advancements in temporal natural language processing for the financial domain.conference pape

    SCaLAR IT at the NTCIR-18 FinArg-2: Temporal Inference of Financial Arguments

    Full text link
    The SCaLAR IT team participated in the Detection of Argument Temporal References subtask of the NTCIR-18 FinArg-2 Task. This paper presents our approach to solving the classification of financial arguments based on temporal references. We explored multiple ar- chitectures combining a BERT-based model with knowledge-based and temporal feature extraction techniques. To improve the perfor- mance,integrated BERT with TF-IDF based temporal features were extracted using STANZA and BERT embeddings to enhance tempo- ral reference detection. Our first model BERTForSequenceClassifier achieves the Micro F1 score of 70.24% and Macro F1 score of 67.85% outperforming most approaches of other teams. However incorpo- rating additional temporal features improved the Macro F1 score, indicating better performance across all classes. We analyze the effectiveness of different feature representations in our research.conference pape

    LSAT Focus: EAGLE’s Embedded Entities Highlighting Technique for NTCIR-18 Lifelog-6

    Full text link
    This paper presents our work in the Lifelog Semantic Access Task (LSAT) at NTCIR-18, focusing on automatic searching methods for finding distinct life moments. Our experiments explore and compare different retrieval strategies, including keyword matching-based search combined with embedding extraction, vector embedding-based semantic search using a multimodal model, and hybrid methods that take advantage of both approaches. Our proposed method improved retrieval accuracy by directing the model's attention to key query terms while prioritizing semantic relevance and the presence of requested entities in the retrieved moments. Experimental results demonstrated that the best-performing method relies on embeddings incorporating extended descriptions and highlighted keywords. Conversely, the hybrid methods in our experiments have less effective results, likely due to limitations in the keyword-matching search algorithm. This work's findings underscore the richer descriptive entities within queries to enhance the retrieval of life moments, ensuring a focus on core semantic and visual elements.conference pape

    NTCIR-18 RadNLP 2024 Overview: Dataset and Solutions for Automated Lung Cancer Staging

    Full text link
    Radiology reports play a vital role in clinical workflows, serving as a primary means for radiologists to communicate imaging findings to physicians. However, the increasing number of imaging studies has made it challenging to produce and interpret comprehensive reports in a timely manner. Natural language processing (NLP) has shown potential to alleviate this burden, yet most existing studies are limited to English, while clinical reports are often written in local languages. To address this gap, we have developed and released Japanese medical text datasets through a series of shared tasks. Our recent efforts, including NTCIR-16 Real-MedNLP and NTCIR-17 RR-TNM, focused on automating lung cancer staging from radiology reports using the TNM classification system. This task is clinically significant, yet challenging due to the implicit nature of staging information and the complexity of TNM criteria. In this paper, we introduce the NTCIR-18 RadNLP 2024 shared task, which extends the previous task with finer-grained classification, a larger and bilingual corpus, and new sentence-level subtasks. We present the dataset, participating systems, and evaluation results, aiming to provide practical insights into building NLP systems for cancer staging support.conference pape

    2,022

    full texts

    2,035

    metadata records
    Updated in last 30 days.
    NII Repository (National Institute of Informatics)
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇