International Journal of Digital Curation
Not a member yet
605 research outputs found
Sort by
Data Management Planning for an Eight-Institution, Multi-Year Research Project
While data management planning for grant applications has become commonplace alongside articles providing guidance for such plans, examples of data plans as they have been created, implemented, and used for specific projects are only beginning to appear in the scholarly record. This article describes data management planning for an eight-institution, multi-year research project. The project leveraged four data management plans (DMP) in total, one for the funding application and one for each of the three distinct project phases. By understanding researcher roles, development and content of each DMP, team internal and external challenges, and the overall benefits of creating and using the plans, these DMPs provide a demonstration of the utility of this project management tool
Data Showcases: the Data Journal in a Multimodal World
As an experiment, the Research Data Journal for the Humanities and Social Sciences (RDJ) has temporarily extended the usual format of the online journal with so-called ‘showcases’, separate web pages containing a quick introduction to a dataset, embedded multimedia, interactive components, and facilities to directly preview and explore the dataset described. The aim was to create a coherent hyper document with content communicated via different media (multimodality) and provide space for new forms of scientific publication such as executable papers (e.g. Jupyter notebooks). This paper discusses the objectives, technical implementations, and the need for innovation in data publishing considering the advanced possibilities of today\u27s digital modes of communication. The data showcases experiment proved to be a useful starting point for an exploration of related developments within and outside the humanities and social sciences. It turns out that small-scale experiments are relatively easy to perform thanks to the easy availability of digital technology. However, real innovation in publishing affects organization and infrastructure and requires the joint effort of publishers, editors, data repositories, and authors. It implies a thorough update of the concept of publication and adaptation of the production process. This paper also pays attention to these obstacles to taking new paths
Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections
Data driven applications often require using data integrated from different, large, and continuously updated collections. Each of these collections may present gaps, overlapping data, have conflicting information, or complement each other. Thus, a curation need is to continuously assess if data from multiple collections are fit for integration and reuse. To assess different large data collections at the same time, we present the Synchronic Curation (SC) framework. SC involves processing steps to map the different collections to a unifying data model that represents research problems in a scientific area. The data model, which includes the collections\u27 provenance and a data dictionary, is implemented in a graph database where collections are continuously ingested and can be queried. SC has a collection analysis and comparison module to track updates, and to identify gaps, changes, and irregularities within and across collections. Assessment results can be accessed interactively through a web-based interactive graph. In this paper we introduce SC as an interdisciplinary enterprise, and illustrate its capabilities through its implementation in ASTRIAGraph, a space sustainability knowledge system
Privacy Impact Assessments for Digital Repositories
Trustworthy data repositories ensure the security of their collections. We argue they should also ensure the privacy of researcher and research subject data. We demonstrate the use of a privacy impact assessment (PIA) to evaluate potential privacy risks to researchers using the ICPSR’s Researcher Passport as a case study. We present our workflow and discuss potential privacy risks and mitigations for those risks.
[A previous version of this article is available as an IDCC2020 Conference Paper] 
Uncommon Commons? Creative Commons Licencing in Horizon 2020 Data Management Plans
As policies, good practices and mandates on research data management evolve, more emphasis has been put on the licencing of data, which allows potential re-users to quickly identify what they can do with the data in question. In this paper I analyse a pre-existing collection of 840 Horizon 2020 public data management plans (DMPs) to determine which ones mention creative commons licences and among those who do, which licences are being used.
I find that 36% of DMPs mention creative commons and among those a number of different approaches towards licencing exist (overall policy per project, licencing decisions per dataset, licencing decisions per partner, licensing decision per data format, licensing decision per perceived stakeholder interest), often clad in rather vague language with CC licences being “recommended” or “suggested”. Some DMPs also “kick the can further down the road” by mentioning that “a” CC licence will be used, but not which one. However, among those DMPs that do mention specific CC licences, a clear favourite emerges: the CC-BY licence, which accounts for half of the total mentioning of a specific licence.
The fact that 64% of DMPs did not mention creative commons at all is an indication for the need for further training and awareness raising on data management in general and licencing in particular in Horizon Europe. For those DMPs that do mention specific licences, 60% would be compliant with Horizon Europe requirements (CC-BY or CC0). However, it should be carefully monitored whether content similar to the 40% that is currently licenced with non- Horizon Europe compliant licences will in the future move to CC-BY or CC0 or whether such content will simply be kept fully closed by projects (by invoking the “as open as possible, as close as necessary” principle), which would be an unintended and potentially damaging consequence of the policy
Where There\u27s a Will, There\u27s a Way: In-House Digitization of an Oral History Collection in a Lone-Arranger Situation
Analog audio materials present unique preservation and access challenges for even the largest libraries. These challenges are magnified for smaller institutions where budgets, staffing, and equipment limit what can be achieved. Because in-house migration to digital of analog audio is often out of reach for smaller institutions, the choice is between finding the room in the budget to out-source a project, or sit by and watch important materials decay. Cost is the most significant barrier to audio migration. Audio preservation labs can charge hundreds or even thousands of dollars to migrate analog to digital. Top-tier audio preservation equipment is equally expensive. When faced with the decomposition of an oral history collection recorded on cassette tape, one library decided that where there was a will, there was a way. The College of Education One-Room Schoolhouse Oral History Collection consisted of 247 audio cassettes containing interviews with one-room school house teachers from 68 counties in Kansas. The cassette tapes in this collection were between 20-40 years old and generally inaccessible for research due to fear the tapes could be damaged during playback. This case study looks at how a single Digital Curation Librarian with no audio digitization experience migrated nearly 200 hours of audio to digital using a $40 audio converter from Amazon and a campus subscription to Adobe Audition. This case study covers the decision to digitize the collection, the digitization process including audio clean-up, metadata collection and creation, presentation of the collection in CONTENTdm, and final preservation of audio files. The project took 20 months to complete and resulted in significant lessons learned that have informed decisions regarding future audio conversion projects.
 
Software Must be Recognised as an Important Output of Scholarly Research
Software now lies at the heart of scholarly research. Here we argue that as well as being important from a methodological perspective, software should, in many instances, be recognised as an output of research, equivalent to an academic paper. The article discusses the different roles that software may play in research and highlights the relationship between software and research sustainability and reproducibility. It describes the challenges associated with the processes of citing and reviewing software, which differ from those used for papers. We conclude that whilst software outputs do not necessarily fit comfortably within the current publication model, there is a great deal of positive work underway that is likely to make an impact in addressing this
Towards a Semantic Interoperable Flemish Research Information Space: Development and Implementation of a Flemish Application Profile for Research Datasets
In Flanders, Research Performing Organizations (RPO) are required to provide information on publicly financed research to the Flemish Research Information Space (FRIS), a current research information system and research discovery platform hosted by the Flemish Department of Economics, Science and Innovation. FRIS currently discloses information onresearchers, research institutions, publications, and projects. Flemish decrees on Special and Industrial research funding, and the Flemish Open Science policy require RPOs to also provide metadata on research datasets to FRIS. To ensure accurate and uniform delivery of information across all information providing institutions on research datasets to FRIS, it isnecessary to develop a common application profile for research datasets. This article outlines the development of the Flemish application profile for research datasets that was developed by the Flemish Open Science Board (FOSB) WorkingGroup Metadata and Standardization. The main challenge was to achieve interoperability among stakeholders, which in part had existing metadata schemes and research information infrastructures in place, while others were still in the early stages of development
Assessment, Usability, and Sociocultural Impacts of DataONE: A Global Research Data Cyberinfrastructure Initiative
DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research
Identifying Opportunities for Collective Curation During Archaeological Excavations
Archaeological excavations are comprised of interdisciplinary teams that create, manage, and share data as they unearth and analyse material culture. These team-based settings are ripe for collective curation during these data lifecycle stages. However, findings from four excavation sites show that the data interdisciplinary teams create are not well integrated. Knowing this, we recommended opportunities for collective curation to improve use and reuse of the data within and outside of the team