International Journal of Digital Curation
Not a member yet
605 research outputs found
Sort by
Improving the Reproducibility of LaTeX Documents by Enriching Figures with Embedded Scripts and Data
The introduction of open-access data policies by research councils, the enforcement of best practices, and the deployment of persistent online repositories have enabled datasets which support results in scientific papers to become more widely accessible. Unfortunately, despite this advancement in the curation/publishing workflow, the data-driven figures within a paper often remain difficult to reproduce. Plotting or analysis scripts rarely accompany the manuscript or any associated software release; and even if they do, it may be unclear exactly which version was used. Furthermore, the precise commands and parameters used to execute the scripts are often not included in a README file or in the paper itself. This paper introduces a new open-source digital curation tool, Pynea, for improving the reproducibility of LaTeX documents. Each figure within a document is enriched by automatically embedding the plotting script and data files required to generate it, such that it can be regenerated by readers of the paper in the future. The command used to execute the plotting script is also added to the figure\u27s metadata, along with details of the specific version of the script used (if the script is tracked with the Git version control system). If the document is to be recompiled with a figure that has since changed, or had its plotting script or data files modified, the figure is regenerated such that the author can be confident that the latest version of the figure and its dependencies are included
Building the Picture Behind a Dataset
As part of the European Commission funded FREYA project The British Library wanted to explore the possibility of developing provenance information in datasets derived from the British Library’s collections, the data.bl.uk collection. Provenance information is defined in this context as ‘information relating to the origin, source and curation of the datasets’. Provenance information is also identified within the FAIR principles as an important aspect of being able to reuse and understand research datasets. According to the FAIR principles, the aim is to understand how to cite and acknowledge the dataset as well as understanding how the dataset was created and has been processed. There is also reference to the importance of this metadata being machine readable. By enhancing the metadata of these datasets with additional persistent identifiers and metadata a fuller picture of the datasets and their content could be understood. This also adds to the veracity and understanding the dataset by end users of data.bl.uk
The Road to Partnership: a Stepwise, Iterative Approach to Organisational Collaboration in RDM, Archives and Records Management
Research data management (RDM) sits at the confluence of a number of related roles. The shape an RDM confluence takes depends on several factors including the nature of an organisation and the research that it undertakes. At St George’s, University of London, the UK’s only university dedicated to medical and health sciences education, training and research, RDM has been intricately interwoven with organisational information governance roles since its inception. RDM is represented on our institutional Information Governance Steering Group and our Information Management Team consisting of information governance, data protection, freedom of information, archives, records management and RDM.
This paper reports on how RDM, archives and records management have collaborated using a step-wise, iterative process to streamline and harmonise our guidance and workflows in relation to the stewardship, curation and preservation of research data. As part of this we consistently develop, conduct and evaluate small projects on managing, curating and preserving data. We present three projects that we collaborated on to transform research data services across each of our departments:
planning for, conducting and reporting on interviews with wet laboratory researchers
advocating, building a case for and delivering a university-wide digital preservation system
ongoing work to recover, preserve and facilitate access to a unique national health database
Learnings from these projects are used to develop our guidance, improve our activities and integrate our workflows, the outcomes of which may be further evaluated. Learnings are also used to improve our ways of working together. Through deeper integration of our activities and workflows, rather than simply aligning aspects of our work, we are increasingly becoming partners on research data stewardship, curation and preservation. This approach offers several benefits to the organisation as it allows us to build on our related knowledge and skills and deliver outcomes that demonstrate greater value to the organisation and the researchers we support
Quality and Trust in the European Open Science Cloud
The European Open Science Cloud (EOSC) has the objective to provide a virtual environment offering open and seamless services for the re-use of research data across borders and scientific disciplines. This ambitious vision sets significant challenges that the research community must meet if the benefits of EOSC are to be realised. One of those challenges, which has both technical and cultural aspects, is to determine the “Rules of Participation” that enable users to assess the quality of the data and services provided through EOSC and thereby enable them to trust the data and services they access. This paper discusses some issues relevant to determining the Rules of Participation that will enable EOSC to meet these objectives.
 
Do Open Data Badges Influence Author Behaviour? a Case Study at Springer Nature
Digital badges have previously been shown to incentivise journal authors to share their data openly. In this paper we introduce an Open data badging project at the Springer Nature journal BMC Microbiology. The development of the Open data badge is described, as well as the challenges of developing standard badging criteria and ensuring authors’ awareness of the badges. Next steps for the badging project are outlined, which are based on the experiences of the team assessing the badges, the number of badges awarded at the journal to date, and the results of an author survey
Data Curator in the Middle: Curating Data for a Diverse Community of Stakeholders
The Prevention and Early Intervention Research Initiative is an archiving project to preserve the data and reports that were generated by twelve years of philanthropic and state investment into prevention and early intervention approaches in the children and youth sector in Ireland and Northern Ireland. The investment resulted in an extensive collection of evaluation data and reports, which collectively provide an evidence base for continued investment into PEI programmes that are shown to be effective. In 2016, the Prevention and Early Intervention Research Initiative (PEI-RI) was established to preserve the outputs from these evaluations in the national data archives, as a publicly available evidence base. The political and social significance of this collection is manifest in the range of stakeholder groups that the project is engaging with, including the community and not-for-profit organisations that operated the PEI programmes, the research teams from academic institutions that evaluated these programmes, and representatives from government departments that co-funded many of these programmes with Atlantic.
This paper tells the story of the PEI-RI archiving project, describing the steps we’ve taken since 2016 to preserve and promote the PEI data. During the course of the project we realised that it would not be enough to provide access to the data alone, as "[g]enerating and collating the evidence is of no use if it never reaches the commissioners and professionals who need it" (What Works Network, 2014, pp. 6). In the second phase of our project we are creating a range of resources for practitioner and decision maker audiences which provide a pathway to the data using the archival infrastructure.
The project provides a case study of curating a digital collection that is intended for multiple stakeholders with different expectations of the archived material. The PEI-RI data curator is located in the middle of a triad of data creators, data consumers and data archives, and is tasked with balancing the interests, expectations and limitations of each
Towards Trusted Identities for Swiss Researchers and their Data
In this paper we report on efforts to enhance the Swiss persistent identifier (PID) ecosystem. We will firstly describe the current situation and the need for improvement in order to describe in full detail the steps undertaken to create a Swiss-wide model. A case study was undertaken by using several data sets from the domains of art and design in the context of the ICOPAD project. We will provide a set of recommendations to enable a PID service that could mint Archival Resource Key (ARK) identifiers or a flavour of Research Resource Identifiers (RRIDs) as complement to Digital Object Identifiers (DOIs). We will conclude with some remarks concerning the transferability of this approach to other areas and the requirements for a national hub for PID management in Switzerland
Role of Content Analysis in Improving the Curation of Experimental Data
As researchers are increasingly seeking tools and specialized support to perform research data management activities, the collaboration with data curators can be fruitful. Yet, establishing a timely collaboration between researchers and data curators, grounded in sound communication, is often demanding. In this paper we propose manual content analysis as an approach to streamline the data curator workflow. With content analysis curators can obtain domain-specific concepts used to describe experimental configurations in scientific publications, to make it easier for researchers to understand the notion of metadata and for the development of metadata tools. We present three case studies from experimental domains, one related to sustainable chemistry, one to photovoltaic generation and another to nanoparticle synthesis. The curator started by performing content analysis in research publications, proceeded to create a metadata template based on the extracted concepts, and then interacted with researchers. The approach was validated by the researchers with a high rate of accepted concepts, 84 per cent. Researchers also provide feedback on how to improve some proposed descriptors. Content analysis has the potential to be a practical, proactive task, which can be extended to multiple experimental domains and bridge the communication gap between curators and researchers.
[This paper is a conference pre-print presented at IDCC 2020 after lightweight peer review.
Embedding Analytics within the Curation of Scientific Workflows
This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis1. Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow.
These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse.
 
CiTAR - Preserving Software-based Research
In contrast to books or published articles, pure digital output of research projects is more fragile and, thus, more difficult to preserve and more difficult to be made available and to be reused by a wider research community. Not only does a fast-growing format diversity in research data sets require additional software preservation but also today’s computer assisted research disciplines increasingly devote significant resources into creating new digital resources and software-based methods.
In order to adapt FAIR data principles, especially to ensure re-usability of a wide variety of research outputs, novel ways for preservation of software and additional digital resources are required as well as their integration into existing research data management strategies.
This article addresses preservation challenges and preservation options of containers and virtual machines to encapsulate software-based research methods as portable and preservable software-based research resources, provides a preservation plan as well as an implementation.