International Journal of Digital Curation

Not a member yet

605 research outputs found

Sort by

Caring for Data’s Soul : The development of a Curation Impact Factor to pinpoint the effects of data curation activities on data quality

Author: Bechert Insa
Beck Kerstin
Solanes Ros Ivet
Publication venue: University of Edinburgh
Publication date: 15/08/2025
Field of study

Curation matters for data quality! Hardly any survey data user would disagree with this statement. But how much of a difference it makes is difficult to count. In this paper, we will illustrate on the example of data from two cross-national social survey programs, the European Value Study (EVS) and the International Social Survey Programme (ISSP), the most common errors that occur in uncurated international comparative data and draw attention to the problems that can arise from such errors in analyses’ results. To facilitate quality assessment and enable the assessment of data quality variation between countries within a survey, we developed a scheme that categorizes these errors, helps quantify them, and assigns them to possible curation measures. Based on this scheme, we developed an indicator that is called the Curation Impact Factor (CIF) that puts a concrete number on the data quality improvement due to curation effort and allows for comparability even across surveys. Therefore, the CIF could potentially be used to justify the use of resources for data curation in any survey data life cycle (e.g., in grant applications)

Informed Consent Contexts in a Multidisciplinary Research Data Repository

Author: Jackson Brian
Azmi Nurulamirah
Publication venue: University of Edinburgh
Publication date: 10/12/2025
Field of study

Secondary use of research data requires an understanding of the contexts in which it was collected. While depositors are often encouraged to describe methodological and structural contexts in the form of metadata and documentation, ethical contexts have received much less attention. As open data mandates and an ethos of FAIR (findable, accessible, interoperable, reuseable) data proliferate across disciplines, participant consent for unknown future secondary uses of data is increasingly sought, even for minimal risk research. Terms of broad consent generally establish limitations on data reuse, but those limitations may not be clear when data are accessed via an open repository. The absence of these contexts increases the risk that secondary uses of data will be inconsistent with the expectations of original research participants and may place unnecessary burden on research ethics boards. This study examines the dataset records in a large, multidisciplinary data repository to determine the extent to which and how informed consent information is communicated to secondary users, and the degree to which conditions of access and use of data adhere to terms of informed consent. We identified all records published in Borealis: The Canadian Dataverse Repository between January 2022 and September 2024 containing individual-level human data. From those records, we analysed the frequency with which consent information was included and methods used to do so. We further compared terms of consent with the licensing, textual, and technological conditions placed on access and use of the data. Results indicate that informed consent contexts are infrequently provided alongside data and that access and use conditions align with terms of consent for a slim majority of the sample datasets. Based on these findings, we provide recommendations for the development of repository policy and guidelines that harmonise terms of consent and data use, the standardisation of language establishing access and use conditions, the adoption of metadata schema describing ethical contexts, and additional collaboration among data stewards and research ethics boards

Using Metadata to Promote Transparency in Health Research: Creating the COVID Measures Archive at ICPSR

Author: Chenoweth Megan
Kubale John
Publication venue: University of Edinburgh
Publication date: 17/02/2025
Field of study

Data sharing is a key strategy for fostering transparency, reproducibility, and trust in scientific research. Data sharing is endorsed and even required by many funders, such as the National Institutes of Health (NIH) in the United States. However, many NIH-funded projects face obstacles to data sharing, either to protect research participants’ privacy, safeguard proprietary data, or remain compliant with data use agreements. Yet event researchers who cannot openly share data still benefit from openness and transparency into one another’s work, and to making their own research more transparent where possible. The Social, Behavioral, and Economic COVID Coordinating Center at ICPSR (SBE CCC) has launched a new archive aimed at addressing these challenges within the domain of social, behavioral, and economic (SBE) research into the COVID-19 pandemic. In September 2023, SBE CCC launched the COVID measures archive with the dual goals of a) offering researchers the ability to compare measures across SBE studies of COVID while b) protecting contributors’ needs for privacy and confidentiality in health research. The COVID measures archive primarily holds variable-level metadata, which provides visibility into the individual variables and measures employed in studies without necessitating the sharing of confidential or restricted data. This brief report describes the features of the COVID measures archive and illustrates how it can be used to foster transparency and consistency across SBE COVID studies

In Sharing We Trust. Taking Advantage of a Diverse Consortium to Build a Transparent Data Service in Catalonia

Author: Llebot Clara
Alcalá Mireia
Anglada i de Ferrer Lluís M.
Publication venue: University of Edinburgh
Publication date: 28/01/2025
Field of study

The Consorci de Serveis Universitaris de Catalunya (CSUC) is a consortium that serves 13 universities and 33 research centers in Catalonia and neighboring communities. In 2017 the Consortium created an Open Science department to collaborate with universities and research centers on facilitating the adoption of Open Science requirements. Even though CSUC also offers services to researchers directly (for example, its supercomputing resources), this report will focus on CSUC’s work with its member institutions to create and offer data management services. We will explain how CSUC has led the creation of a robust shared governance system, and how it takes advantage of the diversity of its members to create useful, high quality, and transparent services for all researchers in the Catalan research system. Through sharing each other’s experiences, values and priorities, the result is better than separate ad-hoc solutions. The process also creates a community of practitioners that develop expertise together with the help of professional development opportunities organized by CSUC, like recurrent self-learning labs focused on data curation tools, techniques and processes

Developing Specialized Data Curation Curricula to Meet Growing Demands: A Community-based and Evolving Approach

Author: Lafferty-Hess Sophia
Erickson Seth
Keshavarzian Neggin
Marsolek Wanda
Moore Jennifer
Narlock Mikala
Publication venue: University of Edinburgh
Publication date: 13/05/2025
Field of study

Data curation is “the encompassing work and actions taken by curators of a data repository in order to provide meaningful and enduring access to data” (Johnston et al., 2018a). It can be multifaceted and complex based on the types of data, the expertise of the curator, disciplinary expectations, and repository policies. With evolving data sharing practices and standards, ensuring data curators and stewards have access to high-quality, extensible instruction on specific data types is essential for supporting the goals of open research and accessible data sharing, particularly in the landscape where funders (National Institutes of Health, 2020) and journals (Naughton & Kernohan, 2016) are mandating data publication for the purpose of reproducibility, reuse, and external validation. In brief, data need to be curated for effective re-use and in alignment with the FAIR principles (Wilkinson et al., 2016). The Data Curation Network (DCN) (Johnston et al., 2018b) has been actively developing education and training programs to expand capacity in data curation along multiple axes. This paper will explore the progression of the DCN’s education program based primarily within the United States, highlighting a recent effort to develop specialized data curation education for four specialized data types. We will conclude with lessons learned, reflections on growth of education efforts in the DCN more broadly, and potential next steps

A Country-level Case Study:: On the Evolution of UK Institutional Research Data Services

Author: Mallalieu Ruth
Rice Robin
Publication venue: University of Edinburgh
Publication date: 24/11/2025
Field of study

This paper examines milestones and unique service aspects of six research-intensive Higher Education Institutions’ approaches to research data management policy and service, almost one decade on from their respective beginnings, based on findings from a 2024 internal benchmarking study conducted by the University of Oxford, which consisted of interviews with library-based research data management service providers at five peer UK institutions. Both similarities and differences are examined, and milestones are mapped against external events and policies in the RDM field. Future directions, and areas of convergence and divergence especially, will be explored across six institutions: the Universities of Cambridge, Edinburgh, Manchester and Oxford, Imperial College London, and University College London (UCL).

Evaluating the efficacy and impact of a pilot programme for FAIR data stewardship at a UK university

Author: Zagrodzka Zuzanna
Adams Jenni
Campbell Richard
Foster Helen
Publication venue: University of Edinburgh
Publication date: 20/11/2025
Field of study

Increasingly, funders, publishers, and institutions expect researchers to comply with the FAIR principles to ensure that data is findable, accessible, interoperable, and reusable. In an institutional context, however, questions remain as to how organisations can move beyond a broad commitment to FAIR, coupled with support for researchers to comply nominally with related grant conditions, to a more embedded and sustainable approach with a meaningful and pervasive impact on the FAIRness of research outputs. A data stewardship model offers one way to achieve this, yet in contrast to universities in mainland Europe and especially in the Netherlands, the UK is substantially lacking in such infrastructure at an institutional level, hampering efforts to evidence its potential impact within UK institutions and thereby advocate for its adoption. This article examines efforts to address this challenge via a recent project at the University of Sheffield to establish a pilot support service around FAIR data stewardship. It also provides a case study of how the benefits and impact of such an intervention might be identified and articulated through an evidence-led evaluation

Curatorial Agency in IR Migrations: A Case Study of the University of Toledo Digital Repository

Author: Sabharwal Arjun
Publication venue: University of Edinburgh
Publication date: 10/12/2025
Field of study

This case study focuses on the role of curatorial agency in the migration of the University of Toledo Digital Repository (UTDR). Institutional repository (IR) migrations are necessary preservation actions intended to ensure long-term access to digital content. Disruptions resulting from iterative migrations may diminish user trust in IR services and present other risks. Curatorial agency refers to the responsibility and authority of curators mediating between digital media and audiences and can mitigate some unforeseen or unavoidable effects of data migrations. Curatorial agency is established through connections and negotiations within heterogeneous actor-networks, which result in transformational processes, such as those associated with data migrations. Therefore, this case study takes a sociotechnical approach needed for an analytical framework, which merges elements of the actor-network theory with those of the Digital Curation Centre’s (DCC) curation lifecycle model and a Levels of Representation in Digital Collections framework based on Lee’s model. It focuses on the vital role of curatorial agency in UTDR migrations. Using a detailed account of the repository migration and framework analysis, this case study offers significant insight into the role of curatorial agency in managing migrations and establishing new curation strategies, including virtual exhibitions. Key findings include increased transparency of transformational processes in the UTDR migrations and in the role of curatorial agency in the preservation framework

Base4NFDI: Fostering A Cross-Disciplinary Service Landscape For The German National Research Data Infrastructure

Author: Zänkert Sandra
Manske Antje
Miller Bernhard
Fluck Juliane
Publication venue: University of Edinburgh
Publication date: 01/08/2025
Field of study

Base4NFDI is a joint initiative by the 26 consortia of the German National Research Data Infrastructure (NFDI), aiming to develop essential cross-disciplinary basic services that enable FAIR data practices. Through a proposal-driven, bottom-up process, Base4NFDI supports technical and organisational solutions, such as identity and access management, computing, software, and workflows that serve the NFDI community. Proposals emerge from NFDI sections, where domain and infrastructure experts collaborate across disciplines. The role of Base4NFDI is to provide and orchestrate a multi-stakeholder process to decide which services to fund and to ensure coherence through structured development phases (Initialisation, Integration, and Ramp-Up), supported by staff who facilitate coordination and quality assurance. So far, eight candidates are under development, such as IAM4NFDI, TS4NFDI, and Jupyter4NFDI. This brief report introduces the Base4NFDI approach, outlines the decision-making and support processes, introduces current service candidates, shares early experiences and challenges, and provides an outlook on sustainability and international interoperability, particularly with the European Open Science Cloud (EOSC)

Collaborative Data Cleaning Framework: a Pilot Case Study for Machine Learning Development

Author: Parulian Nikolaus
Ludäscher Bertram
Publication venue: University of Edinburgh
Publication date: 09/12/2024
Field of study

This study experiments with collaborative data cleaning, a pivotal phase in data preparation for both analysis and machine learning. We used a provenance Data Cleaning Model (DCM) for multi-user scenarios to track changes on a dataset and conduct comprehensive experiments that simulate multiple data curators working collaboratively on a dataset. Furthermore, we analyzed how different data-cleaning scenarios to improve quality metrics of completeness and correctness of a dataset can affect the downstream machine learning modeling performance.

522

full texts

605

metadata records

Updated in last 30 days.

International Journal of Digital Curation

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇