IASSIST Quarterly (Journal)
Not a member yet
764 research outputs found
Sort by
Stewarding our resources: Building a sustainable IPUMS archival document access system
IPUMS at the University of Minnesota has created the world’s largest accessible database of census and survey microdata. The IPUMS suite of products contains nine harmonized data products. The largest of these projects, IPUMS International (IPUMS-I) has supported the curation and preservation of ancillary materials received during data acquisition efforts. Archival staff have preserved thousands of unique pieces of census and survey documentation, creating bibliographic records using an extended Dublin Core profile that supports the use of controlled vocabularies to enhance findability for the project staff and outside users. The goal of this curation work was to create a findable, searchable, and downloadable document access system for our internal use and to support IPUMS researchers. This paper describes our experience constructing a tool that supports exploration and dissemination of these archived materials. During this development, we gained valuable insight about stewarding our resources that are applicable to research organizations responsible for curating, preserving, and disseminating archival materials
Exploratory and directed search strategies at a social science data archive
Researchers need to be able to find, access, and use data to participate in open science. To understand how users search for research data, we analyzed textual queries issued at a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). We collected unique user queries from 988,475 user search sessions over four years (2012-16). Overall, we found that only 30% of site visitors entered search terms into the ICPSR website. We analyzed search strategies within these sessions by extending existing dataset search taxonomies to classify a subset of the 1,554 most popular queries. We identified five categories of commonly-issued queries: keyword-based (e.g., date, place, topic); name (e.g., study, series); identifier (e.g., study, series); author (e.g., institutional, individual); and type (e.g., file, format). While the dominant search strategy used short keywords to explore topics, directed searches for known items using study and series names were also common. We further distinguished exploratory browsing from directed search queries based on their page views, refinements, search depth, duration, and length. Directed queries were longer (i.e., they had more words), while sessions with exploratory queries had more refinements and associated page views. By comparing search interactions at ICPSR to other natural language interactions in similar web search contexts, we conclude that dataset search at ICPSR is underutilized. We envision how alternative search paradigms, such as those enabled by recommender systems, can enhance dataset search
Developing canonical ‘safe researcher’ training materials for trusted research environments
Social science and humanities research infrastructures allow the sharing and safe use of confidential, sensitive data for research via physical safe havens. In recent years there has been a shift towards virtual data enclaves or Remote Desktop systems that offer fewer physical controls. These controls need to be replaced with other safeguards, including mandatory ‘Safe Researcher’ training. This training aims to ensure that researchers are equipped with the knowledge required to use secure data safely. Developing training is resource intensive so canonical training materials are an economical approach to providing standardized, high-quality training.
The Social Sciences and Humanities Open Cloud project deliverable ‘Training materials of workshop for secure data facility professionals ́ had two objectives. The first was the development of a set of canonical training materials that Trusted Research Environments (TREs) could use as a framework on which to build their own training course. The second objective was to hold a virtual workshop where the training materials could be demonstrated to a credible audience to gather feedback to inform the future development of the materials.
We have now developed the canonical materials, building on the wealth of expertise and experience of UK-based TREs. These training materials were then demonstrated at a virtual, two-hour Stakeholder Workshop that we organized in September 2021. Following our demonstration of the materials, we facilitated small group discussions to gather vital feedback. The discussion groups formed a consensus that the materials were both comprehensive and clearly structured and would be a valuable resource to the TRE community
The IPUMS Business Process Model: Instituting a workflow mapping strategy to support archival processes
The IPUMS Preservation Archive is instituting a workflow mapping strategy to further identify IPUMS process and metadata capture points to expand its holdings in the data archive. Drawing on two business process models, the Generic Statistical Business Process Model (GSBPM) and the Generic Longitudinal Business Process Model (GLBPM), archival staff have created an IPUMS Business Process Model (IPUMS BPM). The IPUMS BPM reflects the use of secondary data sources and the work of harmonization and integration to create a data infrastructure that supports research across time and space. Internally, the IPUMS BPM provides a clear visualization of the IPUMS workflow from external submission of data, harmonization process, documentation, extraction systems, and archival preservation of metadata. The challenge for archival staff is furthering the understanding and adoption of the IPUMS BPM within the IPUMS project groups, and to identify metadata production points that require the intervention of the archive for provenance and preservation purposes. It is part of an on-going effort to clearly define the role of the archive within IPUMS as an integral part of IPUMS organization and workflow. This paper identifies the value of instituting this mapping approach to gain a clearer understanding of the role of the archive within project work cycles, points where production and preservaton activities intersect, and opportunities to expand archival holdings
Digitising old Yoruba newspapers at Kenneth Dike Library
The Kenneth Dike Library and the Nigeria National Archives are especially rich in ancient collections, particularly those unique to southwestern Nigeria, home to many people of the Yoruba extraction. These facilities house print and non-print materials such as personal notes and written collections of prominent persons, old manuscripts, ancient and modern maps, journals, and old Yoruba newspapers. Many of these print materials, especially the newspapers, are deteriorating. In a bid to prolong shelf-life, access to these old materials is limited. As newspapers serve as gateways to the past, this restricted access can impact the research experience of users.
The paper begins by presenting the project framework, which was designed before the project began. It goes on to detail the nuances involved in the several stages of the digitisation process and considers the aftermath of digitising the papers in terms of ownership, storage, backup, and access. This project revealed two things: first, though digitisation solves the problem of access and preservation, it is still necessary to preserve the original materials to prevent loss due to technical issues. Second, funding, and international partnership work hand in hand with digitisation, as it is a capital-intensive activity. Last, the paper contributes to the ongoing debates on the cultural, and socio-political discourses entwined with the technical processes of digitisation. The highlighted project was sponsored by the European Research Council (ERC) in collaboration with local partners. The website, https://yorubaprints.wordpress.com/yoruba-erc-project/ raises awareness for the project
Working towards securing and building a trusted institutional research data repository through the CoreTrustSeal process: case of Cape Peninsula University of Technology data repository
In support of the open science movement and as a signatory of the Berlin Declaration, the Cape Peninsula University of Technology has since 2013 developed various systems, infrastructures and workflows to support open access and good research data management practices at the institution, providing a highly functional environment. Institutional policies that include a Research Data Management Policy and an Open Access Policy, data deposit guidelines and data deposit platforms are currently in place and utilized by affiliated postgraduate students and researchers from faculties, research units and entities as well as researchers from academic support units in alignments with FAIR principles. The strategy of a requirement that postgraduate students must submit their research data together with their theses for graduation purposes has increased the advocacy and publishing of datasets and includes the supervisors as part of the review process. The purpose of this paper is therefore to highlight the initial developmental trajectory and what was achieved to date. This includes the selection of the platform through the ilifu project in the Western Cape, the implementation and strengthening of the repository review workflows to include a number of key role players to ensure the quality and integrity of the data as well as ethics approval checks, the development of the data management planning tool and a recent upgrade to include a section for POPIA compliancy, advocacy, training and processes that the institution has embarked on to secure the research data platform through proper preservation methodologies/approaches as a preservation platform was recently procured. Some challenges will be discussed and how those were addressed. The paper will also outline the process of how the institution embarked on applying to have the data repository certified as trustworthy through an international institution, the CoreTrustSeal, and will outline this near three-year journey to work towards achieving their 17 requirements.  
Evaluating new technologies and organizational structures
Welcome to the last issue of IASSIST Quarterly for 2024, IQ 48(4).
We are excited to share news of several developments that we have been working on over the last few months:
The IASSIST Qualitative Social Science and Humanities Data Interest Group (QSSHDIG) is planning an IASSIST Quarterly special issue dedicated to the complexities of sharing qualitative data. For this special issue, we invite submissions of abstract proposals focused on the ethical challenges, methodological concerns, and labor involved in making qualitative data and research materials publicly available. The full CfP and details on how to submit an abstract can be viewed on the IASSIST Quarterly website: https://iassistquarterly.com/index.php/iassist/announcement/view/7 . The deadline for proposing articles is January 31st (full articles won’t be needed until later).
We are delighted to welcome Minglu Wang as a new IQ Editorial Board member (as of October 2024). Minglu is the Research Data Management Librarian in the Open Scholarship Department at York University Libraries, York, Ontario, Canada. Among other qualifications, she brings experience as a member of the Editorial Board for ACRL’s College & Research Libraries (C&RL) (2019–2025), and she led the project group for that Board to investigate a data policy for C&RL.
A new feature recently enabled on the OJS platform allows reviewers to link their profile with their ORCID iD. We mentioned last time that this will enable auto-loading of your articles to your ORCID profile, but the other effect is that it provides an opportunity for reviewers to receive credit and be acknowledged for their professional contributions. Note that the credit will merely note that you have served as a reviewer for the IQ—it will not indicate which article(s) you reviewed.
Unfortunately, the IQ editorial team had to retract a paper from publication this fall due to plagiarism. The paper titled “Data protection and right to privacy legislation in Kenya” by Mankone, A. M. (2023), was published in IQ, 47(3-4). The full retraction notice can be found here.
This new issue of IQ 48(4) presents four excellent papers. The first two evaluate methods to enhance findability of data deposited in data repositories. The subsequent two papers focus on organizational structure and improving organizational workflows.
Kokila Jamwal in ”Boosting data findability: The role of AI-enhanced keyword” examines the use of Artificial Intelligece (AI) to supplement keywords that may be missing or inaccurately defined as a method to improve metadata and boost data findability. The author suggests that using this relatively new technology may reduce the time and effort required by data repositories staff for data curation and may enhance data findability and usability.
Co-authors Knut Wenzig and Xiaoyao Han are examining the findability of data deposited in data repositories that are using DDI metadata standards. Their paper ”State of DDI Cloud” invetigates the availability and the comprehensive element usage of DDI standards across 29 repositories registered on re3data.org. Based on their findings they provide recommendations for various stakeholders including the repositories, Dataverse developers, re3data.org, and the DDI Alliance.
The article ”The IPUMS Business Process Model: Instituting a workflow mapping strategy to support archival processes” introduces the IPUMS workflow from external submission of data, harmonization process, documentation, extraction systems, and archival preservation of metadata. Author Diana Magnuson explains the value of instituting this mapping approach, and demonstrates the power of a clear business process model for developing archival goals in an organizational setting in which the archive function is vital but secondary to the main product.
In ”Understanding motivations and future needs for data depoists at Korea Social Sciences Data Archive”, authors Hyowon Kim, Do Won Kim and Jungwon Yang evaluate the current data deposit process of the Korea Social Science Data Archive (KOSSDA). The data archive recently transitioned into an idependent researh center under Seoul National Univerity. Using interviews with stakeholders, they identify future needs and suggest a long-term strategy to ensure that the archive meets the needs of the academic community it supports.
Wishing you a happy holidays season, and peace, health, and happiness in the New Year.
Ofira Schwartz and Michele Hayslett, December 2024
Boosting data findability: The role of AI-enhanced keywords
In today’s data-driven world, finding relevant data in a vast expanse of information is increasingly important. Researchers have been exploring various methods to improve the findability, accessibility, interoperability, and reusability of data, for example, by using controlled vocabularies to enhance data findability. Although the use of controlled vocabularies is growing, challenges remain for findability when users provide their own keywords, known as user-defined keywords or do not provide keywords at all. Finding data in data archives based on metadata fields with user-defined or missing keywords is challenging, or even impossible. Here, we show the use of artificial intelligence (AI) techniques from the subfield of deep learning to automate the assignment of keywords using controlled vocabulary, leading to improved data findability. The main results demonstrate that AI automation performs well on the test set. In addition, we comapre our deep learning model against large language model (LLM) on the task of automated topic assignment. Automated topic assignments will reduce the time and effort required for data curation, enhancing data findability and usability for data producers and consumers. The application of AI to automate metadata assignment offers practical solutions for improving data findability and reusability, not only in research data archives but across various data-driven domains. Overall, this approach highlights the potential of AI in addressing data findability challenges, paving the way for more efficient and effective data discovery and utilization in the era of big data and information abundance
Research Analysis: A World Data System and Canadian CoreTrustSeal Cohort Needs Assessment
From July 2022 to December 2022, the World Data System (WDS) International Technology (ITO) and International Program (IPO) Offices conducted a review of strategic plans and technical roadmaps of all current WDS members and the set of Canadian repositories that participated in the Digital Research Alliance of Canada\u27s CoreTrustSeal Certification Support and Funding Pilot (Digital Research Alliance of Canada, 2022). In this paper, we describe how a new organizational assessment method was designed and utilized to identify the needs and challenges faced by the WDS and Canadian CTS Pilot members. Our method relied on reviewing public-facing documentation provided by the repositories, with a priority on strategic plans and technical road maps. In total, we reviewed 95 sources of information, including 33 strategic plans and 3 technical roadmaps describing a total of 95 out of the original 147 target organizations. In this paper, we also describe our assessment tool and the overarching challenges and goals we identified through the usage of this tool. Finally, we will describe the limitations of our methodology and provide recommendations from the World Data System on how best to assist the WDS members and the cohort of Canadian data repositories based on our findings
Building infrastructure and networks – rewards and challenges
Welcome to the third issue of IASSIST Quarterly for 2024, IQ 48(3).
As we are moving towards an open research environment, institutions are building infructractures that will enable sharing data and other research resources with a wider audience. The authors of the three papers in this issue offer our readers the benefit of their experience by sharing what they have learned through the process of establishing new infrustractures and networks.
The article ”Future models and architecture of data repositories in African universities,” describes the existing landscape of data repositories in African universities. Chigwada and Chiware use a review of existing literature to identify requirements for establishing an institutional data depository, and also identify successes and challenges. Based on their research they offer a roadmap for universities in Africa that are interested in establishing a data repository.
The second article titled ”Working towards securing and building a trusted institutional research data repository through the CoreTrustSeal process: case of Cape Peninsula University of Technology data repository” seems like a natural extention of the previous one. The three authors, Lockhart, Xesi and Chiware (a co-authors of the previous paper), describe the process of establishing a research data repository at Cape Peninsula University of Technology. They provide details about the journey, starting with developing Open Access (OA) and Research Data Management (RDM) policies, identifying tools and developing the infrustructure needed for a an institutional data repository and data preservation, and developing a training program for faculty, students and staff. Additionally the authors comment on their experience, challenges and lessons learned from the application for CoreTrustSeal (CTS) certification for their newly created repository, eSango.
In their article, ”Building human networks to drive forward innovations in international data access: Introducing the International Secure Data Facility Professionals Network (ISDFPN),” authors Wiltshire, Lichtwardt and Bishop describe the motivation and process of establishing the International Secure Data Facility Professional Network (ISDFPN), a forum that brings together international colleagues to share expertise and experience, and to collaborate in developing Trusted Research Environments (TREs).
We hope you enjoy reading.
Ofira Schwartz and Michele Hayslett, September 202