International Journal of Digital Curation

Not a member yet

605 research outputs found

Sort by

Managing Retractions and their Afterlife: A Tripartite Framework for Research Datasets

Author: Curty Renata
Publication venue: University of Edinburgh
Publication date: 09/06/2025
Field of study

Retractions serve as a critical, albeit last-resort, post-publication correction mechanism in scholarly publishing, playing an important role in upholding the integrity of the scientific record. By formally retracting flawed or misleading research, the scientific community mitigates the harm caused by errors or misconduct that may have escaped detection during peer review. While retractions of research articles have been extensively discussed across scientific disciplines and are well-integrated into most publishers\u27 workflows, the retraction of research datasets remains underexplored and rarely implemented. This paper seeks to address this gap by reviewing recent developments in this area, analyzing a sample of publicly available retracted dataset records considering existing recommendations and guidelines, and putting forward a few points for discussion—particularly for cases where datasets have been published and correction is no longer feasible, or when all efforts to amend the dataset have been exhausted. These considerations are framed into three main categories: (1) preventive actions and timely response, (2) purposeful damage control, and (3) community engagement and shared standards. Although still preliminary, this framework aims to help entertain future debates and inform actionable strategies for addressing the unique challenges of managing retracted datasets where scientific rigor has been compromised. By contributing to the discussion on dataset retractions, this work seeks to better equip data curators, repository managers, and other stakeholders with tools to enhance accountability and transparency throughout the data preservation process, while also helping to mitigate the error cascade effect in science

FAIR Principles Implementation in ML/AI - Findings from Skills4EOSC Delphi Study

Author: Osmenaj Elda
Sharma Curtis J M
Moschini Ugo
Berberi Lisana
Pasquale Valentina
Publication venue: University of Edinburgh
Publication date: 18/08/2025
Field of study

Implementing the FAIR (Findable, Accessible, Interoperable, and Reusable) principles for scientific data management in machine learning (ML) and artificial intelligence (AI) offers numerous benefits, including higher model reliability, more collaborative research, and greater reproducibility. Despite these advantages, there is a lack of clear, practical guidelines for improving the FAIRness of ML/AI outputs, especially the models. To address this gap, Skills4EOSC Task 6.3.3a of Work Package 6: Professional Networks for Lifelong Learning conducted a Delphi Study to gather expert consensus on implementing FAIR principles in ML/AI model development. A Delphi study, involving two rounds of surveys followed by an online meeting, was conducted. In the first round, ML/AI experts from Europe and beyond rated suggested FAIR practices and proposed additional ones. The second round involved feedback and re-evaluation of these practices. The final meeting included detailed discussions on the Top 10 practices for FAIR principles implementation in ML/AI. The resulting Top 10 practices aim to provide guidelines for researchers and data management professionals to implement FAIR principles in ML/AI

Data Curation: Introducing a Competency Framework for the Social Sciences

Author: Behrens Kathrin
Kvetnaya Tatiana
Publication venue: University of Edinburgh
Publication date: 13/06/2025
Field of study

Research data management includes more than the question how researchers handle their data. In the sense of the FAIR principles, it is also about the sustainable safeguarding and organized reusability of research data. For social science, data-intensive research, research data centers and their data curating staff are therefore becoming increasingly important: data curators usually take on curation-specific tasks such as data preparation, securing research data in suitable archival environments, ensuring data accessibility, and the related control of the conditions of data re-use by third parties. Hence, they are specialized in the entire data curation process and, in particular, take on tasks of archiving and providing research data for reuse. Although the standards of comprehensive research data management are becoming more and more specific, this trend has not yet arrived in the corresponding training and further education measures. As a result, there is a gap between the growing demands on data curators and the development of competencies in the field of research data management with a focus on data curation. The competency framework presented in this article is intended to help close this gap: based on a Data Curation Lifecycle Model, a competency framework has been developed to support the development of targeted training and continuing education programs in the field of data curation, the formulation of learning objectives, and the evaluation of the corresponding trainings. The article points out the necessity to advance the development of competencies for this field, illustrates the schematic substructure of the data curation lifecycle, describes the development as well as the central core elements of the presented competency framework and discusses its perspectives. Overall, this competence framework is aimed in particular at (future) data curators, or as a schematic basis for the training of the relevant personnel. The focus is primarily on the data-intensive discipline of social sciences, although large parts can certainly be adapted for other disciplines and the corresponding data curation. The competency framework and this companion article are thereby intended to assist in advancing the sustainable professionalization of the previously understudied competency field of data curation

Research Data Lifecycle (RDLC): An Investigation into the Disciplinary Focus, Use Cases, Creator Backgrounds, Stages and Shapes of RDLC Models

Author: Jiang Jie
Maurici-Pollock Danielle
Tang Rong
Publication venue: University of Edinburgh
Publication date: 09/02/2025
Field of study

In this paper, we report the results of a study examining 78 Research and Data Lifecycle (RDLC) models located in a review of the literature. Through synthesis-analysis and the nominal group technique, we investigated the RDLC models from the point of view of their disciplinary focus, use cases, model creators, as well as the specific stages and shapes. Our study revealed that the majority of the disciplinary focus for the models was generic, science, or multi-disciplinary. Models originating in the social sciences and humanities are less common. The use cases varied in a wide spectrum, with a total of 34 different scenarios. The creators and authors of the RDLC models came from more than 20 countries with the majority of the models created as a result of collaboration within or across different organizations. Our stage and shape analysis also outlined key characteristics of the RDLC models by showing the commonalities and variations of named stages and varying structures of the models. As one of the first empirical investigations examining the deep substance of the RDLC models, our study provides significant insights into the context and setting where the models were developed, as well as the details with regard to the stages and shapes, and thereby identified gaps that may impact the use and value of the models. As such, our study establishes a foundation for further studies on the practical utilization of the RDLC models in research data management practice and education

Data Makers and Users\u27 Views on Useful Paradata: Priorities in Documenting Data Creation, Curation, Manipulation and Use in Archaeology

Author: Huvila Isto
Andersson Lisa
Sköld Olle
Liu Ying-Hsang
Publication venue: University of Edinburgh
Publication date: 11/02/2025
Field of study

Understanding and making data (re)usable requires adequate documentation of the data but also information on how it has been created, curated, manipulated and used, termed in data documentation literature as paradata. This paper reports results of a survey study (N=91) of data creating and (re)using archaeologists\u27 views of what data creation, curation, manipulation and use related information (termed here as paradata) they consider important when they are working with data. Data makers\u27 and users\u27 perceptions align to a considerable degree. It is important to have an explanation of the original general context of data creation and knowing the purpose, procedures and methods of data making, analysis and documentation. The findings underline that there is a need to continue developing and testing ideas how to capture and document paradata, and to find ways how to help data makers adopt proven practices to facilitate paradata making. Simultaneously, it is crucial that the paradata aimed at facilitating data use is relevant for data users rather than, for instance, technical or administrative details considered useful primarily by data makers

Event Notifications and Event Logs: Transparent Sharing of Artifact Life Cycle Data

Author: Hochstenbach Patrick
Verborgh Ruben
Van de Sompel Herbert
Publication venue: University of Edinburgh
Publication date: 26/03/2025
Field of study

The “Event Notifications in Value-Adding Networks” specification provides an interoperable fabric that can be used in scholarly communication to exchange messages among data nodes that make scholarly artifacts available to the network and service nodes that add value to these artifacts. For example, a data repository can have a request-response conversation with a long-term archive that results in the latter relaying the coordinates of an archived version of the dataset to the repository. The push-oriented notification protocol is based on W3C Recommendations, both regarding the messaging protocol and payloads.   Implementations of the protocol are in various stages of maturity, the most advanced being the COAR Notify effort that focuses on overlay peer review as a service.  An important consequence, and actual design goal, of the conversational interoperability approach, is the ability it provides to bi-directionally interlink the scholarly artifact and the service result in real-time, providing an attractive alternative to current interlinking approaches that by and large are heuristic-based and generate results with significant delays. Another consequence is the ability to publish an Event Log for each scholarly artifact that lists all event notifications that were exchanged about it, providing full transparency about its entire life cycle, including where and how it was registered, archived, reviewed, commented upon, etc. This paper describes essential aspects of the Event Notification protocol and illustrates it using a scenario. It then describes the Event Logs concept and illustrates it by means of that same scenario. It then gives an overview of challenges related to specifying Event Logs that are currently under investigation and largely relate to equipping them with affordances to make them verifiable and trustworthy

Researchers and Research Data: Improving and Incentivising Sharing and Archiving

Author: Ventsel Minna
Montague-Hellen Beth
Publication venue: University of Edinburgh
Publication date: 28/01/2025
Field of study

There has been a lot of discussion within the scientific community around the issues of reproducibility in research, with questions being raised about the integrity of research due to failure to reproduce or confirm the findings of some of the studies. Researchers need to adhere to the FAIR (findable, accessible, interoperable, and reusable) principles to contribute to collaborative and open science, but these open data principles can also support reproducibility and issues around ensuring data integrity. This article uses observations and metrics from data sharing and research integrity related activities, undertaken by a Research Integrity and Data Specialist at the Francis Crick Institute, to discuss potential reasons behind a slow uptake of FAIR data practices. We then suggest solutions undertaken at the Francis Crick institute which can be followed by institutes and universities to improve the integrity of research from a data perspective. One major solution discussed is the implementation of a data archive system at the Francis Crick Institute to ensure the integrity of data long term, comply with our funders’ data management requirements, and to safeguard our researchers against any potential research integrity allegations in the future

Bridging the Gap Between Process and Procedural Provenance for Statistical Data

Author: McPhillips Timothy
Gager Jack
Thelen Thomas
Alter George Charles
Iverson Jeremy
Ludäscher Bertram
Smith Dan
Publication venue: University of Edinburgh
Publication date: 08/09/2025
Field of study

We show how two models of provenance can work together to answer basic questions about data provenance, such as “What computed variables were affected by values of variable X?”. Questions like this are central for understanding how data is managed and modified. W3C PROV is a widely used standard for describing the people, activities, and sources that create things like documents, a work of arts, and data sets. PROV associates processes with inputs and outputs, but it does not have a way to describe how data are changed within a process. PROV has no language for program components, like mathematical expressions or joining data tables. Structured Data Transformation Language (SDTL) was designed to provide machine-actionable representations of data transformation commands in statistical analysis software. SDTL describes the inner workings of programs that are black boxes in PROV. However, SDTL is detailed and verbose, and simple queries can be very complicated in SDTL. Structured Data Transformation History (SDTH) bridges the gap between PROV and SDTL. SDTH extends the PROV data model to answer questions about data preparation and management operations not available in PROV

ASSURED: Assuring Safe Research by Safe People

Author: Wiltshire Deborah
Parker Simon
González Ribao Vanessa
Publication venue: University of Edinburgh
Publication date: 24/09/2025
Field of study

The ASSURED project aims to address the need for standardised training for researchers and professionals working with sensitive data in Trusted Research Environments (TREs) in Germany. As data sharing for research continues to grow, safeguarding sensitive data is critical, particularly as Open Science and FAIR data principles promote wider access. However, ensuring secure data access while minimising risks requires robust safeguards, such as the Five Safes Model. An essential element of this model is the ‘Safe People’ component, emphasising the importance of well-trained individuals who understand data confidentiality and disclosure risks. Currently, training for researchers and TRE staff in Germany is inconsistent, with few formal systems in place. To remedy this, the ASSURED project has developed an e-learning programme offering flexible, modular training to ensure that researchers and TRE staff meet essential data security standards. The programme includes core modules applicable to all users, with additional role-specific content tailored to various TRE services. By integrating the training with an Authentication and Authorisation Infrastructure (AAI), the programme ensures streamlined tracking of completion and facilitates cross-service access. The ASSURED project aims to enhance data protection and support the European Open Science Cloud initiative, promoting responsible data use across borders

TROV - A Model and Vocabulary for Describing Transparent Research Objects

Author: Li Meng
McPhillips Timothy
Willis Craig
Parulian Nikolaus
Ludäscher Bertram
Kowalik Kacper
Vilhuber Lars
Lewis Thu-Mai
Gooch Mandy
Publication venue: University of Edinburgh
Publication date: 12/02/2025
Field of study

The Transparent Research Object Vocabulary (TROV) is a key element of the Transparency Certified (TRACE) approach to ensuring research trustworthiness. In contrast with methods that entail repeating computations in part or in full to verify that the descriptions of methods included in a publication are sufficient to reproduce reported results, the TRACE approach depends on a controlled computing environment termed a Transparent Research System (TRS) to guarantee that accurate, sufficiently complete, and otherwise trustworthy records are captured when results are obtained in the first place. Records identifying (1) the digital artifacts and computations that yielded a research result, (2) the TRS that witnessed the artifacts and supervised the computations, and (3) the specific conditions enforced by the TRS that warrant trust in these records, together constitute a Transparent Research Object (TRO). Digital signatures provided by the TRS and by a trusted third-party timestamp authority (TSA) guarantee the integrity and authenticity of the TRO. The controlled vocabulary TROV provides means to declare and query the properties of a TRO, to enumerate the dimensions of trustworthiness the TRS asserts for a TRO, and to verify that each such assertion is warranted by the documented capabilities of the TRS. Our approach for describing, publishing, and working with TROs imposes no restrictions on how computational artifacts are packaged or otherwise shared, and aims to be interoperable with, rather than to replace, current and future Research Object standards, archival formats, and repository layouts

522

full texts

605

metadata records

Updated in last 30 days.

International Journal of Digital Curation

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇