1,721,078 research outputs found
Focused Issue on Digital Library Challenges to Support the Open Science Process
Open Science is the broad term that involves several aspects aiming to remove the barriers for sharing any kind of output, resources, methods or tools, at any stage of the research process (https://book.fosteropenscience.eu/en/). The Open Science process is a set of transparent research practices that help to improve the quality of scientific knowledge and are crucial to the most basic aspects of the scientific process by means of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. Thanks to research transparency and accessibility, we can evaluate the credibility of scientific claims and make the research process reproducible and the obtained results replicable. In this context, digital libraries play a pivotal role in supporting the Open Science process by facilitating the storage, organization, and dissemination of research outputs, including open access publications and open data. In this focused issue, we invited researchers to discuss innovative solutions, also related to technical challenges, about the identifiability of digital objects as well as the use of metadata and ontologies in order to support replicable and reusable research, the adoption of standards and semantic technologies to link information, and the evaluation of the application of the FAIR principles
Benchmarking Automatic Tools for Neologisms Extraction: Issues and Challenges
Human language is constantly evolving, driven by societal, technological, and cultural shifts, which lead to the creation of new terms and expressions. The rise of digital platforms, including social media and academic publications, has accelerated the introduction and spread of these neologisms. This paper explores current advancements and challenges in benchmarking automated and semi-automated tools for extracting neologisms. In particular, we will discuss challenges in dataset creation and evaluation procedures, such as defining neologisms, ensuring diverse text sources, managing annotation variability, and evaluating these tools
IMS-UNIPD @ CLEF eHealth Task 1: A memory based reproducible baseline
In this paper, we report the results of our participation to the CLEF eHealth 2021 Task on “Multilingual Information Extraction". This year, this task focuses on Named Entity Recognition from Spanish clinical text in the domain of radiology reports. In particular, the main objective is to classify entities into seven different classes as well as hedge cues. Our main contribution can be summarized as follows: 1) continue the study of minimal/reproducible pipeline for text analysis baselines using a tidyverse approach in the R language; 2) evaluate the simplest memory based classifiers without optimization
UniPadova @ LeQua 2022: A Preliminary Study of a BM25 Approach to Quantification
Our participation to the LeQua lab continues the sequence of experiments dedicated to minimal coding that use the R Tidyverse packages to build reproducible source code for experiments in IR related tasks. In this specific case, we focused on the two-dimensional interpretation of the BM25 ranking formula that studies the distribution of documents on a two-dimensional space to study the quantification task without any type of optimization
"Interactive" undergraduate students: UNIPD at iCLEF 2008
This is the first year of participation of the University of Padua to the interactive CLEF track. A group of students of Linguistics of the Faculty of Humanities were asked to participate in the experiment. An analysis of the questionnaires together with some log analysis is carried out with the aim of studying: the interaction of the user with a cross-lingual system, the solutions they find for a given task, and the tools that a system should provide in order to assist the user in the task
Technology Assisted Review Systems: Current and Future Directions
Technology-Assisted Review (TAR) systems are becoming indispensable in domains demanding extensive document screening with high precision, notably in eDiscovery and systematic biomedical reviews. Recent advancements in machine learning, particularly the emergence of Large Language Models (LLMs), have expanded the capabilities of TAR systems, enabling them to handle voluminous text data more efficiently and accurately. Despite these strides, significant challenges remain, including the development of effective stopping criteria, availability of high-quality domain-specific datasets, and robust evaluation metrics to ensure reproducibility and defensibility in high-stakes applications. This paper surveys recent trends and emerging methodologies in TAR, with an emphasis on approaches aimed at improving document relevance screening, query generation, and validation protocols across active learning (AL) and reinforcement learning (RL) frameworks. We examine the utilization of LLMs for Boolean query refinement and abstract screening, particularly in enhancing systematic review workflows. Additionally, we discuss the role of specialized datasets and data-driven approaches in addressing the unique requirements of TAR systems in fields like biomedical research and eDiscovery
A study on a mixed stopping strategy for total recall tasks
How do we calculate how many relevant documents are in a collection? In this abstract, we discuss our line of research about total recall systems such as interactive system for systematic reviews based on an active learning framework [4–6]. In particular, we will present 1) the problem in mathematical terms, and 2) the experiments of an interactive system that continuously monitors the costs of reviewing additional documents and suggests the user whether to continue or not in the search based on the available remaining resources. We will discuss the results of this system on the ongoing CLEF 2019 eHealth task
As Simple as Possible: Using the R Tidyverse for Multilingual Information Extraction. IMS Unipd at CLEF eHealth 2020 Task 1
In this paper, we report the results of our participation to the CLEF eHealth 2020 Task on “Multilingual Information Extraction”. This task focuses on coding of medical textual data using the International Statistical Classification of Diseases and Related Health Problems (ICD) in Spanish. The main objective of our participation to this task is the study of reproducible experiments that use minimal effort to be set up and run and that can be used as a baseline. The contribution of our experiments to this task can be summarized as follows: the implementation of a reproducible pipeline for text analysis that uses universal dependency parsing; an evaluation of simple classifiers based on perfect matches on different morphological levels together with a tf-idf approach
A study on lemma vs stem for legal information retrieval using R tidyverse. IMS UniPD @ AILA 2020 Task 1
In this paper, we describe the results of the participation of the Information Management Systems (IMS) group at AILA 2020 Task 1, precedents and statutes retrieval. In particular, we participated in both subtasks: precedents retrieval (task a) and statutes retrieval (task b). The goal of our work was to compare and evaluate the efficacy of a simple reproducible approach based on the use of either lemmas or stems with a tf-idf vector space model and a plain BM25 model. The results vary significantly from one subtask/evaluation measure to another. For the subtask of statutes retrieval, our approach performed well, being second only to a participant that used BERT to represent documents
Behavior and design of screwed-head fasteners in reinforced concrete under tensile loading
Screwed-head fastener is the common fabricated hold-down bolt for steel structures and machine foundations. Although different models are available for evaluating its structural behavior, there are still aspects that need to be investigated. In particular, conflicting approaches for the design can be found among the European design-oriented documents. Within this context, a comprehensive experimental study on screwed-head fasteners under tensile loading was recently carried out at Milan Polytechnic. In this paper, some results are presented and commented which include (a) the presence of cracks and (b) the presence of supplementary reinforcement. In the discussion, predictive models are recalled demonstrating the need for a specific design approach, which should consider the geometry and the resistance of the fastening system, including that of the concrete member. Some design recommendations end the paper as useful guidance for the structural designer
- …
