International Journal of Digital Curation
Not a member yet
605 research outputs found
Sort by
Updating the Data Curation Continuum: not just Data, still focussed on Curation, more about Domains
The Data Curation Continuum was developed as a way of thinking about data repository infrastructure. Since its original development over a decade ago, a number of things have changed in the data infrastructure domain. This paper revisits the thinking behind the original data curation continuum and updates it to respond to changes in research objects, storage models, and the repository landscape in general.
 
Getting to Beta: Building a Model Collection in a World of Digital One-Offs
Libraries and archives are increasingly producing subject-based digital collections alongside, but separate from, their main digital collections. These smaller projects are often treated as digital one-offs; they are created, launched, promoted, and then largely forgotten. The authors of this study argue that small-scale digital collections instead be treated as test cases for their institutions’ main digitization programs. Because they are lightweight and have relatively low stakes, these collections get pushed through the system quickly and can illuminate its workings and shortcomings in a snapshot form. The authors treat their own experience in developing the Animal Welfare Act History Digital Collection at the National Agricultural Library as a case study in using a digital collection to test and revise an institution’s digitization program. In so doing, this study suggests how agile projects like the AWAHDC can be core components in digital curation policies and their implementation. 
From Passive to Active, From Generic to Focussed: How Can an Institutional Data Archive Remain Relevant in a Rapidly Evolving Landscape?
Founded in 2008 as an initiative of the libraries of three of the four technical universities in the Netherlands, the 4TU.Centre for Research Data (4TU.Research Data) has provided a fully operational, cross-institutional, long-term archive since 2010, storing data from all subjects in applied sciences and engineering. Presently, over 90% of the data in the archive is geoscientific data coded in netCDF (Network Common Data Form) – a data format and data model that, although generic, is mostly used in climate, ocean and atmospheric sciences. In this practice paper, we explore the question of how 4TU.Research Data can stay relevant and forward-looking in a rapidly evolving research data management landscape. In particular, we describe the motivation behind this question and how we propose to address it
Experimental Data Curation at Large Instrument Facilities with Open Source Software
The National Synchrotron Light Source II operating at Brookhaven National Laboratory since 2014 for the US Department of Energy is one of the newest and brightest storage-ring synchrotron facility in the world. NSLS-II, like other facilities, provides pre-processing of the raw data and some analysis capabilities to its users. We describe the research collaborations and open source infrastructure developed at large instrument facilities such as NSLS-II for the purpose of curating high value scientific data along the early stages of the data lifecycle. Data acquisition and curation tasks include storing experiment configuration, detector metadata, raw data acquisition with infrastructure that converts proprietary instrument formats to industry standards. In addition, we describe a specific effort for discovering sample information at NSLS-II and tracing the provenance of analysis performed on acquired images. We show that curation tasks must be embedded into software along the data life cycle for effectiveness and ease of use, and that loosely defined collaborations evolve around shared open source tools. Finally we discuss best practices for experimental metadata capture in such facilities, data access and the new challenges of scale and complexity posed by AI-based discovery for the synthesis of new materials
Are Research Datasets FAIR in the Long Run?
Currently, initiatives in Germany are developing infrastructure to accept and preserve dissertation data together with the dissertation texts (on state level – bwDATA Diss1, on federal level – eDissPlus2). In contrast to specialized data repositories, these services will accept data from all kind of research disciplines. To ensure FAIR data principles (Wilkinson et al., 2016), preservation plans are required, because ensuring accessibility, interoperability and re-usability even for a minimum ten year data redemption period can become a major challenge. Both for longevity and re-usability, file formats matter. In order to ensure access to data, the data’s encoding, i.e. their technical and structural representation in form of file formats, needs to be understood. Hence, due to a fast technical lifecycle, interoperability, re-use and in some cases even accessibility depends on the data’s format and our future ability to parse or render these.
This leads to several practical questions regarding quality assurance, potential access options and necessary future preservation steps. In this paper, we analyze datasets from public repositories and apply a file format based long-term preservation risk model to support workflows and services for non-domain specific data repositories.
1
BwDATADiss-bw Data for Dissertations:https://www.alwr-bw.de/kooperationen/bwdatadiss/
2EDissPlusDFG-Project – Electronic Dissertations Plus:https://www2.hu-berlin.de/edissplus
Remediation Data Management Plans: A Tool for Recovering Research Data from Messy, Messy Projects
Data Management Plans (DMPs) have been used in the last decade to encourage good data management practices among researchers. DMPs are widely used, preventive tools that encourage good data management practices. DMPs are traditionally used to manage data during the planning stage of the project, often required for grant proposals, and prior to data collection. In this paper we will use a case study to argue that Data Management Plans can be useful in improving the management of the data of research projects that have moved beyond the planning stage of the research life cycle. In particular, we focus on the case of active projects where data has already been collected and is still being analyzed. We discuss the differences and commonalities in structure between preventive Data Management Plans and remedial Data Management Plans, and describe in detail the additional considerations that are needed when writing remedial Data Management Plans: the goals and audience of the document, the data inventory, and an implementation plan. 
Progress in Research Data Services: An international survey of university libraries
University libraries have played an important role in constructing an infrastructure of support for Research Data Management at an institutional level. This paper presents a comparative analysis of two international surveys of libraries about their involvement in Research Data Services conducted in 2014 and 2018. The aim was to explore how services had developed over this time period, and to explore the drivers and barriers to change. In particular, there was an interest in how far the FAIR data principles had been adopted.
Services in nearly every area were more developed in 2018 than before, but technical services remained less developed than advisory. Progress on institutional policy was also evident. However, priorities did not seem to have shifted significantly. Open ended answers suggested that funder policy, rather than researcher demand, remained the main driver of service development and that resources and skills gaps remained issues. While widely understood as an important reference point and standard, because of their relatively recent publication date, FAIR principles had not been widely adopted explicitly in policy
Curating Scientific Workflows for Biomolecular Nuclear Magnetic Resonance Spectroscopy
This paper describes our recent and ongoing efforts to enhance the curation of scientific workflows to improve reproducibility and reusability of biomolecular nuclear magnetic resonance (bioNMR) data. Our efforts have focused on both developing a workflow management system, called CONNJUR Workflow Builder (CWB), as well as refactoring our workflow data model to make use of the PREMIS model for digital preservation. This revised workflow management system will be available through the NMRbox cloud-computing platform for bioNMR. In addition, we are implementing a new file structure which bundles the original binary data files along with PREMIS XML records describing the provenance of the data. These are packaged together using a standardized file archive utility. In this manner, the provenance and data curation information is maintained together along with the scientific data. The benefits and limitations of these approaches, as well as future directions, are discussed in this paper
Secure Data for the Future: A Risk Assessment
The guarantee of secure and authentic future access to any digital data is a big worry to those who work with data now and those who are responsible to keep it accessible for the future. There are a wide range of threats to digital data that these people should need to take into consideration. The project PreservIA had the goal to assess the risks of using analogue 35mm film to store and preserve digital information and define its strengths and weaknesses for long-term secure preservation of all kinds of digital data.
The research project was examining the application of the Piql technology to ensure the security, integrity and authenticity of the information stored on a unique storage medium. PiqlFilm has been designed for a life span of 500 years or more and the research tries to assess how well this solution could maintain the authenticity and availability of the information, independently of internal and external changes in the surrounding environment over time.
The research project has been designed using a scenario-based approach and the morphological method of scenario development is used to define a set of scenarios covering the risks to the service.
The scenario classes used were accident, technical error, natural disaster, crime, sabotage, espionage, terrorism, armed conflict and nuclear war. A scenario template has been included for the purpose of describing current and future scenarios. The final scenario analysis identified potential vulnerabilities.
The paper shows briefly how Piql Preservation Services holistic preservation approach perform the work, defines a methodology to select the scenarios for the assessment and then studies the vulnerabilities and security challenges of the solution on those scenarios. The project also includes a comparison of other existing storage media to evaluate their robustness to the addressed scenarios in relation to Piql technology