International Journal of Digital Curation
Not a member yet
605 research outputs found
Sort by
Organising RDM and Open Science Services: Case Finland and Aalto University
This paper describes how the Finnish Ministry of Education and Culture launched an initiative on research data management and open data, open access publishing, and open and collaborative ways of working in 2014. Most of the universities and research institutions took part in the collaborative initiative building new tools and training material for the Finnish research needs. Measures taken by one university, Aalto University, are described in detail and analysed, and compared with the activities taking place in other universities.
The focus of this paper is in the changing roles of experts at Aalto University, and organisational transformation that offers possibilities to serve academic personnel better. Various ways of building collaboration and arranging services are described, and their benefits and drawbacks are discussed
Putting the Trust into Trusted Data Repositories: A Federated Solution for the Australian National Imaging Facility
The National Imaging Facility (NIF) provides Australian researchers with state-of-the-art instrumentation—including magnetic resonance imaging (MRI), positron emission tomography (PET), X-ray computed tomography (CT) and multispectral imaging – and expertise for the characterisation of animals, plants and materials.
To maximise research outcomes, as well as to facilitate collaboration and sharing, it is essential not only that the data acquired using these instruments be managed, curated and archived in a trusted data repository service, but also that the data itself be of verifiable quality. In 2017, several NIF nodes collaborated on a national project to define the requirements and best practices necessary to achieve this, and to establish exemplar services for both preclinical MRI data and clinical ataxia MRI data.
In this paper we describe the project, its key outcomes, challenges and lessons learned, and future developments, including extension to other characterisation facilities and instruments/modalities
A Framework for the Preservation of a Docker Container
Reliably building and maintaining systems across environments is a continuing problem. A project or experiment may run for years. Software and hardware may change as can the operating system. Containerisation is a technology that is used in a variety of companies, such as Google, Amazon and IBM, and scientific projects to rapidly deploy a set of services repeatably. Using Dockerfiles to ensure that a container is built repeatably, to allow conformance and easy updating when changes take place are becoming common within projects. Its seen as part of sustainable software development. Containerisation technology occupies a dual space: it is both a repository of software and software itself. In considering Docker in this fashion, we should verify that the Dockerfile can be reproduced. Using a subset of the Dockerfile specification, a domain specific language is created to ensure that Docker files can be reused at a later stage to recreate the original environment. We provide a simple framework to address the question of the preservation of containers and its environment. We present experiments on an existing Dockerfile and conclude with a discussion of future work. Taking our work, a pipeline was implemented to check that a defined Dockerfile conforms to our desired model, extracts the Docker and operating system details. This will help the reproducibility of results by creating the machine environment and package versions. It also helps development and testing through ensuring that the system is repeatably built and that any changes in the software environment can be equally shared in the Dockerfile. This work supports not only the citation process it also the open scientific one by providing environmental details of the work. As a part of the pipeline to create the container, we capture the processes used and put them into the W3C PROV ontology. This provides the potential for providing it with a persistent identifier and traceability of the processes used to preserve the metadata. Our future work will look at the question of linking this output to a workflow ontology to preserve the complete workflow with the commands and parameters to be given to the containers. We see this provenance within the build process useful to provide a complete overview of the workflow
Developing a Digital Archive for Symbolic Resources in Urban Environments - the Latina Project
The project described in this paper was funded to establish the foundation for a digital archival resource for researchers interested in the way people interact with urban environments through graphic communications. The research was internally funded by Loughborough University as part of its Research Challenge Programme and involved two members of academic staff and two library staff.[1] Two PhD students also participated.
The archive consists of a small number of images and will act as a proof of concept, not only for this project but also for current and future funding applications. It is hoped that an extended archive will be useful not only to visual communication researchers, but also historians, architects, town planners and others. This paper will describe the data collection process, the challenges facing the project team in data curation and data documentation, and the creation of the pilot archive.
The creation of the archive posed challenges for both the researchers and Library staff. For the researchers:
Choosing a small number of images as a discrete collection but which also demonstrated the utility of the project to other disciplinary areas;
Acquiring the necessary knowledge and skills to enable good curation and usability of the digital objects, e.g. file formats, metadata creation;
Understanding what the technical solution enabled and where compromises would have to be made.
For library staff:
Demonstrating the utility of the Data Repository;
Understanding the intellectual background to the project and the purpose of the Data Archive within the project;
Clearly explaining the purpose of metadata and documentation.
The Latina Project has demonstrated the value of a true partnership between the academic community and the professional services. All parties involved have learnt from the creation of the pilot archive and their practices have evolved. For example, it has made the researchers think more carefully about data curation questions and the professional services staff identify more closely with the research purposes for data creation. By working together so closely and sharing ideas from our different perspectives we have also identified potential technical developments which could be explored in future projects. All members of the group hope that the relationships built during this project will continue through other projects. [1] Academic staff: Drs Harland and Liguori. Library staff: Gareth Cole and Barbara Whetnall
Revealing the Detailed Lineage of Script Outputs Using Hybrid Provenance
We illustrate how combining retrospective and prospectiveprovenance can yield scientifically meaningful hybrid provenancerepresentations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospectiveprovenance when coupled with prospectiveprovenance. Users provide prospective provenance, i.e., the conceptual workflows latent in scripts, via simple YesWorkflow annotations, embedded as script comments. Runtime observables can be linked to prospective provenance via relational views and queries. These observables could be found hidden in filenames or folder structures, be recorded in log files, or they can be automatically captured using tools such as noWorkflow or the DataONE RunManagers. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository
Embedded Metadata Patterns Across Web Sharing Environments
This research project tried to determine how or if embedded metadata followed the digital object as it was shared on social media platforms by using EXIFTool, a variety of social media platforms and user profiles, the embedded metadata extracted from selected New York Public Library (NYPL) and Europeana images, PDFs from open access science journals, and captured mobile phone images. The goal of the project was to clarify which embedded metadata fields, if any, migrated with the object as it was shared across social media
Keep Calm and Fill in Your DMP: Lessons Learnt from a Swiss DMP-Template Initiative
Aligning with other funders such as Horizon 2020, the Swiss National Science Foundation (SNSF) requires researcherswho apply for project funding to provide a Data Management Plan (DMP) as an integral part of their research proposal.In an attempt to assist and guide researchers filling out this document, and to provide a service as efficient as possible, the libraries of the Ecole Polytechnique Fédérale de Lausanne (EPFL) and ETH Zurich took the lead to elaborate on a DMP template with content suggestions and recommendations. In this practice paper, we will describe the collaborative effort between the two Swiss federal institutes of technology, namely EPFL and ETH Zurich, as well as some partners of the national Data Life Cycle Management (DLCM) project, which resulted in a very helpful document as reported by our researchers
Mobilising a Nation: RDM Training and Education in South Africa
The South African Network of Data and Information Curation Communities (NeDICC) was formed to promote the development and use of standards and best practices among South African data stewards and data librarians (NeDICC, 2015). The steering committee has members from various South African HEIs and research councils. As part of their service offerings NeDICC arranges seminars, workshops and conferences to promote awareness regarding digital curation. NeDICC has contributed to the increase in awareness, and growth of knowledge, on the subject of digital and data curation in South Africa (Kahn et al.,2014).NeDICC members are involved in the UP M.IT and Continued Professional Development training, and serve as external examiners for the UCT M.Phil in Digital Curation degree. NeDICC is responsible for the Research Data Management track at the annual e-Research conference in SA1and develops an annual training-focussed programme to provide workshop opportunities with both SA and foreign trainers. This paper specifically addresses the efforts by this community to mobilise and upskill South African librarians so that they would be willing and able to provide the necessary RDM services that would strengthen the national data effort.
1eResearch conference: http://www.eresearch.ac.za
Making Everything Available. British Library Research Services and Research Data Strategy
The way that researchers generate, analyse and share information keeps evolving at a rapid pace. To ensure that it is well equipped to serve its global user base for years to come, the British Library is transforming the way it works too, from the physical buildings to its digital service portfolio. One key programme, Everything Available, will ensure the Library’s continued support for research with services to enable access to information in an open and timely manner. This paper will describe the activities planned within Everything Available, with a particular focus on the aims of the Library’s recently refreshed Research Data Strategy. It will give an insight into the challenges and opportunities faced by a National Library in providing relevant services in an ‘open’ world
The Impact on Authors and Editors of Introducing Data Availability Statements at Nature Journals
This article describes the adoption of a standard policy for the inclusion of data availability statements in all research articles published at the Nature family of journals, and the subsequent research which assessed the impacts that these policies had on authors, editors, and the availability of datasets. The key findings of this research project include the determination of average and median times required to add a data availability statement to an article; and a correlation between the way researchers make their data available, and the time required to add a data availability statement