IASSIST Quarterly (Journal)
Not a member yet
764 research outputs found
Sort by
Reproducibility literature analysis - a federal information professional perspective
This article examines a cross-section of literature and other resources to reveal common reproducibility issues faced by stakeholders regardless of subject area or focus. We identify a variety of issues named as reproducibility barriers, the solutions to such barriers, and reflect on how researchers and information professionals can act to address the ‘reproducibility crisis.’ The finished products of this work include an annotated list of 122 published resources and a primer that identifies and defines key concepts from the resources that contribute to the crisis
Standards and scoring to increase transparency for archived public opinion data
Faced with increased diversification of methodologies in the polling industry, the Roper Center for Public Opinion Research Center is embarking on a major initiative aimed at increasing methodological transparency across the field of public opinion survey research by increasing minimum disclosure requirements and providing users with transparency scoring for new submissions to the archive.
Roper Center, the world’s largest archive of public opinion survey data, has long enforced disclosure requirements for archival submissions based on transparency standards developed by professional organizations in the polling industry, particularly the American Association for Public Opinion Research (AAPOR). Roper Center’s new requirements and scoring mechanism expand longstanding policies and procedures to better meet the challenges of today’s research environment.In this paper, Roper Center’s new standards will be described in the context of the historical development of transparency expectations in the polling community. The paper presentation will also detail the implementation process, providing an account of how standards were translated into actionable DDI-based metadata to drive an automatic scoring system, how new workflows were developed with input from data providers to facilitate maximum disclosure, and how the display of the user interface was designed to ensure the transparency information can be easily viewed and understood
Provenance metadata for statistical data: An introduction to Structured Data Transformation Language (SDTL)
Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software. The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files. SDTL also has potential for auditing scripts and for translating scripts between languages. SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats. Statistical software languages have a number of special features that have been carried into SDTL. We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”. 
Are we ready to share qualitative research data? Knowledge and preparedness among qualitative researchers, IRB members, and data repository curators
Data sharing maximizes the value of data, which is time and resource intensive to collect. Major funding bodies in the United States (US), like the National Institutes of Health (NIH), require data sharing and researchers frequently share de-identified quantitative data. In contrast, qualitative data are rarely shared in the US but the increasing trend towards data sharing and open science suggest this may be required in future. Qualitative methods are often used to explore sensitive health topics raising unique ethical challenges regarding protecting confidentiality while maintaining enough contextual detail for secondary analyses. Here, we report findings from semi-structured in-depth interviews with 30 data repository curators, 30 qualitative researchers, and 30 IRB staff members to explore their experience and knowledge of QDS. Our findings indicate that all stakeholder groups lack preparedness for QDS. Researchers are the least knowledgeable and are often unfamiliar with the concept of sharing qualitative data in a repository. Curators are highly supportive of QDS, but not all have experienced curating qualitative data sets and indicated they would like guidance and standards specific to QDS. IRB members lack familiarity with QDS although they support it as long as proper legal and regulatory procedures are followed. IRB members and data curators are not prepared to advise researchers on legal and regulatory matters, potentially leaving researchers who have the least knowledge with no guidance. Ethical and productive QDS will require overcoming barriers, creating standards, and changing long held practices among all stakeholder groups
How many ways can we teach data literacy?
Academic Libraries are ideally positioned to teach data literacy. What is ‘data literacy’ in the first place? Is it the new information literacy? Will the ways we teach information literacy limit imaginative ways to teach data literacy?
With those questions in mind, the Library of New York University Shanghai has explored multiple ways to teach data literacy to undergraduate students through university events, ‘for-class’ instruction and workshops, and online casebooks. (1) We initiated the yearlong series of events titled ‘Lying with Data’, inviting faculty across disciplines to each address one core data literacy question that students of data science may elude. (2) We offered workshops and in-class instruction that are up-to-date with the latest technology and that fit with the curriculum. (3) We created online casebooks on various topics in the data lifecycle, tackling user needs at different levels. Essential to our teaching activities are two core values: ‘let the quality speak for itself’, and ‘outreach by teaching’. 
Methods reporting that supports reader confidence for systematic reviews in psychology: assessing the reproducibility of electronic searches and first-level screening decisions.
Recent discussions and research in psychology show a significant emphasis on reproducibility. Concerns for reproducibility pertain to methods as well as results. We evaluated the reporting of the electronic search methods used for systematic reviews (SR) published in psychology. Such reports are key for determining the reproducibility of electronic searches. The use of SR has been increasing in psychology, and we report on the status of reporting of electronic searches in recent SR in psychology.
We used 12 checklist items to evaluate reporting for basic electronic strategies. Kappa results for those items developed from evidence-based recommendations ranged from fair to almost perfect. Additionally, using a set of those items to represent a “PRISMA” type of recommended reporting showed that only one of the 25 randomly selected psychology SR from 2009-2012 reported recommended information for all items in the set, and none of the 25 psychology SR from 2014-2016 did so. Using a second less stringent set of items found that only 36% of the psychology SR reported basic information that supports confidence in the reproducibility of electronic searches. Similar results were found for a set of psychology SR published in 2017.
An area for improvements in SR in psychology involves fuller and clearer reporting of the steps used for electronic searches in SR. Such improvements will provide a strong basis for confidence in the reproducibility of searches. That confidence, in turn, can strengthen reader confidence more generally in the results and conclusions reached in SR in psychology
Advocating for reproducibility
As guest editors, we are excited to publish this special double issue of IASSIST Quarterly. The topics of reproducibility, replicability, and transparency have been addressed in past issues of IASSIST Quarterly and at the IASSIST conference, but this double issue is entirely focused on these issues.
In recent years, efforts “to improve the credibility of science by advancing transparency, reproducibility, rigor, and ethics in research” have gained momentum in the social sciences (Center for Effective Global Action, 2020). While few question the spirit of the reproducibility and research transparency movement, it faces significant challenges because it goes against the grain of established practice.
We believe the data services community is in a unique position to help advance this movement given our data and technical expertise, training and consulting work, international scope, and established role in data management and preservation, and more. As evidence of the movement, several initiatives exist to support research reproducibility infrastructure and data preservation efforts:
Center for Open Science (COS) / Open Science Framework (OSF)[i]
Berkeley Initiative for Transparency in the Social Sciences (BITSS)[ii]
CUrating for REproducibility (CURE)[iii]
Project Tier[iv]
Data Curation Network[v]
UK Reproducibility Network[vi]
While many new initiatives have launched in recent years, prior to the now commonly used phrase “reproducibility crisis” and Ioannidis publishing the essay, “Why Most Published Research Findings are False,” we know that the data services community was supporting reproducibility in a variety of ways (e.g., data management, data preservation, metadata standards) in wellestablished consortiums such as Inter-university Consortium for Political and Social Research (ICPSR) (Ioannidis, 2005).
The articles in this issue comprise several very important aspects of reproducible research:
Identification of barriers to reproducibility and solutions to such barriers
Evidence synthesis as related to transparent reporting and reproducibility
Reflection on how information professionals, researchers, and librarians perceive the reproducibility crisis and how they can partner to help solve it.
The issue begins with “Reproducibility literature analysis” which looks at existing resources and literature to identify barriers to reproducibility and potential solutions. The authors have compiled a comprehensive list of resources with annotations that include definitions of key concepts pertinent to the reproducibility crisis.
The next article addresses data reuse from the perspective of a large research university. The authors examine instances of both successful and failed data reuse instances and identify best practices for librarians interested in conducting research involving the common forms of data collected in an academic library.
Systematic reviews are a research approach that involves the quantitative and/or qualitative synthesis of data collected through a comprehensive literature review. “Methods reporting that supports reader confidence for systematic reviews in psychology” looks at the reproducibility of electronic literature searches reported in psychology systematic reviews.
A fundamental challenge in reproducing or replicating computational results is the need for researchers to make available the code used in producing these results. But sharing code and having it to run correctly for another user can present significant technical challenges. In “Reproducibility, preservation, and access to research with Reprozip, Reproserver” the authors describe open source software that they are developing to address these challenges.
Taking a published article and attempting to reproduce the results, is an exercise that is sometimes used in academic courses to highlight the inherent difficulty of the process. The final article in this issue, “ReprohackNL 2019: How libraries can promote research reproducibility through community engagement” describes an innovative library-based variation to this exercise.
Harrison Dekker, Data Librarian, University of Rhode Island
Amy Riegelman, Social Sciences Librarian, University of Minnesota
References
Center for Effective Global Action (2020), About the Berkeley Initiative for Transparency in the Social Sciences. Available at: https://www.bitss.org/about (accessed 23 June 2020).
Ioannidis, J.P. (2005) ‘Why most published research findings are false’, PLoS Medicine, 2(8), p. e124. doi: https://doi.org/10.1371/journal.pmed.0020124
[i] https://osf.io
[ii] https://www.bitss.org/
[iii] http://cure.web.unc.edu
[iv] https://www.projecttier.org/
[v] https://datacurationnetwork.org/
[vi] https://ukrn.or
Capturing their “first” dataset: A graduate course to walk PhD students through the curation of their dissertation data
The data set accompanying theses is a valuable intellectual property asset, both from the viewpoint of the PhD student, who can procure employment and build publications and research grants from the work for years to come, and the university, which owns the data and has invested in the work. However, the data set has generally not been captured as a finished product in a similar manner to the published thesis. A course has been developed which walks PhD students through the process of identifying an archival data set, selecting a repository or long term storage location, creating metadata and documentation for the data package, and the deposit process. A pre- and post assessment has been designed to ascertain the level of data literacy the students gain through curating their own dataset. PIs for the projects have input into the repositories and metadata standards selected. The university thesis office was consulted as the course was developed, so that accurate procedures and practices are reflected throughout the course. This first of a kind class is open to students of any discipline at a Research-1 university. The resulting mixture of data types creates a unique course every time it is offered
Sustainability through the liaison with data archive users
As a social science data archive, we focus on collecting research data and archiving it. However, there are more responsibilities that come with data archiving: cooperation on international social surveys (ISSP, ESS), supporting secondary data analysis and much more. Significant part of our work is to communicate with students and researchers, to educate them about data management and data analysis. Although the relationship we have is functional and seems sufficient, we tend to ask ourselves: who are the data archive users and what do they expect from us?
We decided to employ user-centered design methods and tools to define a typical user of our services and to find out what their motivations for using our data archive are and what specific functions they use and (do not) appreciate, so we would have a better image of their needs. Moreover, we wondered about the role of open science and its impact on the users’ needs and future requirements arising from the open science environment. Obtained information is a point of departure for redesigning archival services to satisfy new demands our users have regarding more data resources, new techniques of scientific work and better interconnection between different platforms