IASSIST Quarterly (Journal)
Not a member yet
764 research outputs found
Sort by
Systemic racism in data practices
Positionality statement
As we begin to discuss this issue, its origins, and its importance in contemporary society, I wanted to acknowledge my positionality and the role that it may play in the formation of this issue. Jonathan O. Cain is an African-American male working in the LIS field. Before moving into administration, I taught data and digital literacy and worked on developing programs that focused on improving access to these critical skills at zero cost to learners.
It is important to acknowledge my positionality and the lens through which I see the data science field. Trevor Watkins is an African American male working in the LIS field at an academic institution in an academic library. I teach critical data literacy workshops and engage in diversity and BIPOC-related digital projects with faculty, students, and the broader academic community across the country. I am also a researcher and practitioner in artificial intelligence (AI) and data science.
The global pandemic, its impacts, and why it matters
We first met in August 2020 to discuss the possibilities of this special issue about five months into the pandemic. We spent a good chunk of that meeting getting to know each other and, most importantly, discussed the toll the pandemic placed on our communities and us. It is probably safe to say that many of you, at some point, were uncertain of the future. Like most people worldwide, we lost family and friends or knew of people who succumbed to Covid-19 and other illnesses that weren\u27t treated because the focus shifted to Covid-19. We get it. At one point, Covid-19 killed over three thousand people per day (Centers for Disease Control and Prevention (CDC), 2022). According to data from the CDC, 90% of the 385,676 people who died between March and December 2020 had Covid-19 listed as the underlying cause of death on their death certificate. The murders of Ahmaud Arbery in February, Breonna Taylor in March, and George Floyd in May 2020 sparked civic unrest across the United States (US) and protests across the globe in solidarity against racial injustice. When we announced this special issue and initiated a call for papers, we didn\u27t get much of a response initially. We expected and acknowledged that it would probably take some time before we received inquiries or proposals about the issue, the intent to submit, or any submissions.
Like many of you, we are still picking up the pieces from 2020 and dealing with the aftermath of Covid-19. The pandemic may be over now, depending on whom you ask, but the emotional scars are still there and may remain so for quite some time. Patience was the one quality we all had throughout this process, which is why we can present this publication today.
Data and liberatory technology
Liberatory technology. This is a concept that invited contemplation as we sat down to record our reflections on this special issue. In drawing together scholars, educators, and practitioners to address the issue of data and its relationship to race, ethnicity, and representation, we, as coeditors, were making a statement about the importance of data, the material impact that this seemingly abstract and ethereal object can and does have on individual and community lives. And thinking about that impact brought liberatory technology to the front of our minds. The definition of liberator technology offered by the IDA B. Wells Just Data Lab intrigues us and invites us to grapple with that topic. They defined liberatory as something that "supports the increased freedom and wellbeing of marginalized people, especially black people outside of capitalism and settler colonial power structures" and technology as "a tool used to accomplish a task." And as we contemplate this set of definitions, we are left to question whether data can be a liberatory technology or not. (LIBERATORY TECHNOLOGY AND DIGITAL MARRONAGE, n.d.)
In Liberation Technology: Black Protest in the Age of Franklin, Richard S. Newman draws parallels with the asserting ownership and mastery of new communication technologies and black liberation activities. Reflecting on the transformative nature of print technology, he writes, "If the Marquis de Condorcet was right in 1793 that print had unshackled Europe from medieval modes of thought and action, then it is also true that print was perhaps the first technology to liberate blacks from the servile images that had long haunted their existence in Western culture." And draws a 19th-century example of how it expressly connects to black lives post-emancipation noting "W. E. B. Du Bois certainly thought that black history and print history worked in tandem. Wherever one found newspapers in the post-Civil War South, he observed, one found some form of black freedom" (Richard S. Newman, 2009, p. 175). He even notes how scholars note that black activists embraced other communication technologies like photography "to reshape the image of African Americans in nineteenth-century culture." (Richard S. Newman, 2009, p. 175)
We have no shortage of examples of how data and data-driven technologies fail to support the "increased freedom and wellbeing of marginalized people outside of capitalism and settler colonial power structures." In 2016, ProPublica published Machine Bias, a report that looks at Risk assessment technologies used in arraignment and sentencing. They report that "The formula was particularly likely to falsely flag black defendants as future, wrongly labeling them this way at almost twice the rate as white defendants" and "white defendants were mislabeled as low risk more often than black defendants" (Julia Angwin, 2016). A 2021 article, Fairness in Criminal Justice Risk Assessments: The State of the Art, in their analysis, noted, "The false negative rate is much higher for whites so that violent white offenders are more likely than violent black offenders to be incorrectly classified as nonviolent. The false positive rate is much higher for blacks so that nonviolent black offenders are more likely than nonviolent white offenders to be incorrectly classified as violent. Both error rates mistakenly inflate the relative representation of blacks predicted to be violent. Such differences can support claims of racial injustice. In this application, the trade-off between two different kinds of fairness has real bite." (Berk et al., 2021, p. 33)
These are just a few examples of how these technological developments, on their own merits, fail to meet the definition offered by the authors of the "Liberatory Technology and Digital Marronage" Zine from the Ida B. Wells Just Data Labs. Reflecting on the technological path illustrated by Newman, the work of ownership and mastery of the tool provides the potential for it to be liberatory. Through this lens, the work of the Just Data Lab is exemplary for this meditation; it draws a direct line from technology, education, mastery, and liberatory technology.
Data in higher education
Data literacy education is an area that has been a focus of our careers in librarianship. It\u27s a space where we saw the libraries\u27 ability to make a meaningful impact. Data has had a tremendous impact on college campuses, from how research is conducted to the pressures colleges feel from stakeholder groups: students, governments, funders, donors, and employers to prepare students with the data and technology skills to gain employment in the knowledge economy.
As colleges and universities have turned (with varying degrees of success) to meet the needs of these communities, a myriad of explorations on the importance of the representation of these marginalized communities in these systems—to combat and dismantle the harmful practices that we see embedded in the systems that drive society and the potentially debilitating consequences they produce. That is partly why the works in this special issue are so important at this moment in time. These scholars and scholar-practitioners are engaging with these issues that drive the opaque structures surrounding us. And hopefully, their work can give us another perspective on how to engage with these structures and transform them to support liberatory practices.
The entries in this issue
We have some fantastic articles for you to read in this issue. We open with an article by Kevin Manuel, Rosa Orlandini, and Alexandra Cooper, who discuss how the collection process of racial, ethnic, and indigenous data has evolved in the Canadian Census since 1871, the erasure of minorities and indigenous citizens from those censuses, and the work to restore and accurately identify and categorize racialized groups.
In the next article, Leigh Phan, Stephanie Labou, Erin Foster, and Ibraheem Ali present a model for data ethics instruction for non-experts by designing and implementing two data ethics workshops. They make important points about the failure of academia to incorporate the ethical use of data in course curriculums and digital literacy training and demonstrate how academic libraries have become an essential resource for the academic community. Their workshop structure can be modeled for any academic library that endeavors to provide a similar service to its community.
In the third article, Natasha Johnson, Megan Sapp Nelson, and Katherine Yngve, interrogate the collective and local purposes of institutional data collection and its impact on student belongingness and propose a framework based on data feminism that centers the student as a person rather than a commodity.
Finally, our closing article from Thema Monroe-White focuses on marginalized and underrepresented people in the data science field. The author proposes that racially relevant and responsive teaching is necessary to recruit more people from these groups and diversify the field. She discusses how the Ladson-Billings model of cultural relevant pedagogy has been applied and is beneficial to STEM curriculums, and how a liberatory data science curriculum could promote a student\u27s voice and sense of belonging.
Conclusion
We want to thank all those involved in producing this special issue. We want to thank the authors first. Their patience, dedication, and perseverance throughout this process were much appreciated. The reviewers provided timely, very detailed, and thorough feedback. We would be remised if we didn\u27t acknowledge their hard work and labor. We would like to thank the IQ Editorial Team, Michele Hayslett and Karsten Boye Rasmussen, for working with us over the last two years, and Ofira Schwartz-Soicher, for helping us get to the finish line.
Trevor Watkins
Jonathan O. Cain
References
Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2021). Fairness in Criminal Justice Risk Assessments: The State of the Art. Sociological Methods & Research, 50(1), 3–44. https://doi.org/10.1177/0049124118782533
Flipsnack. (n.d.). Liberatory Technology Zine. Flipsnack. Retrieved December 17, 2022, from https://www.flipsnack.com/EBC8CD77C6F/liberatory-technology-zine.html
LIBERATORY TECHNOLOGY AND DIGITAL MARRONAGE. (n.d.). IDA B. WELLS JUST DATA LAB. Retrieved December 17, 2022, from https://www.thejustdatalab.com/tools-1/liberatory-technology-and-digital-marronage
Mattu, J. A., Jeff Larson,Lauren Kirchner,Surya. (n.d.). Machine Bias. ProPublica. Retrieved December 17, 2022, from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Richard S. Newman. (2009). Liberation Technology: Black Printed Protest in the Age of Franklin. Early American Studies: An Interdisciplinary Journal, 8(1), 173–198. https://doi.org/10.1353/eam.0.003
Who is counted? Ethno-racial and indigenous identities in the Census of Canada, 1871-2021
Finding data on race, racialized populations, and anti-racism in Canada can be a complex process when conducting research. One source of data is the Census of Canada which has been collecting socio-demographic data since 1871. However, the collection of racial, ethnic, or Indigenous data has changed throughout the years and from Census to Census. In response to the need for more support in finding ethno-racial and Indigenous data, the Ontario Council of University Libraries’ Ontario Data Community has created an online guide to provide guidance, in part, about the terminology used for Indigenous and racialized identities over time in the Census. In this article, the modifications to how ethno-racial origin questions have been asked, and the ongoing changes to sociocultural perceptions impacting the Census are reviewed
Going qual in: Towards methodologically inclusive data work in academic libraries
Data literacy and research data services are a growing part of the work of academic libraries. Data in this context is often presumed to mean only numeric data or statistics, leaving open the question of what role qualitative research plays in services and programming for research data and data literacy. In this paper, we report on the results of interviews with academic librarians about their understanding of data literacy, qualitative research, and academic library infrastructure around qualitative research. From the interviews, we propose a model of data literacy that incorporates both interpretive and instrumental elements. We conclude with suggestions for incorporating qualitative data and analysis methods into academic library programming and services around data literacy and research data
Deposit data - including qualitative data - and support students in obtaining the skills for data-driven research
We talk data. We do data.
Welcome to the third issue of IASSIST Quarterly for the year 2022 - IQ vol. 46(3).
In Denmark we sometimes retrieve an old quote from a member of the Danish Parliament: \u27If those are the facts, then I deny the facts\u27. We have laughed at that for more than a hundred years, but now fact denial is apparently the new normal in many places. And we are not amused. Data can become dangerous as facts can be fabricated. Therefore, a critical approach to data is fundamental to producing reliable information: facts. The articles in this issue are about teaching students good data behavior, and how researchers with great care and attention can carry out the task of fact production.
The first article is about improvement in teaching data: \u27Investigating teaching practices in quantitative and computational Social Sciences: a case study\u27 by Rebecca Greer and Renata G. Curty. The authors are both at the University of California, Santa Barbara Library, where Rebecca Greer is director of Teaching & Learning and Renata Curty is social science research facilitator. They are investigating data education and present some of the findings from a local report - part of a national project - into how instructors adapt curricula and pedagogy to advance undergraduates computational and statistical knowledge in the social sciences. The core goal of the instructors concerns \u27data thinking\u27 - the critical understanding and evaluation of data. Many students have a preconceived fear of mathematics that influences other areas. Personally, I feel that data thinking is essential to live and participation in society, and I believe that it should be achievable even with a background of math fear. However, for social science students I also expect they have acquired some level of \u27data doing\u27. I agree with the authors that the necessary support for data is more often found in the areas of Science, Technology, Engineering and Mathematics than it is in Social Sciences. However, many IASSIST members successfully work to relate data to social science students. And the implicit relationship via data to STEM areas will furthermore often improve job success for social science students. The local study interviewed instructors and the article presents among other things the learning goals and the explicit skills contained in these goals. The study uses many quotations from the interviewees, including quotes on sharing among the instructors. This leads to how the instructors can be further supported and how the library can support them, including a partnership between the library\u27s Research Data Services and Teaching & Learning.
With the second article we continue at a university. Now the focus shifts from teaching to research - the other main area of university work, and more specifically the data in research. The article \u27Research data integrity: A cornerstone of rigorous and reproducible research\u27 is by Patricia B. Condon, Julie F. Simpson and Maria E. Emanuel. All three are in positions at the University of New Hampshire, Durham, USA. The article starts with the foundation of the four Rs of research: rigor, reproducibility, replication, and reuse. The interest in data integrity came from a question at a graduate seminar on the difference between data integrity and data quality. When exploring the data quality component, they found that research data integrity is closely associated with data management as well as with data security. The aims of the article are several, but the first is to establish practical explanations of research data integrity and its components. Training and documentation are fundamental and form the surroundings in the proposed Research Data Integrity Model that also graphically presents the overlapping areas between the components: data quality, data management, and data security. I find this focus on the sharing between components a structurally clear approach, and with good outcome too. When juggling concepts that often are regarded as being more or less identical, it is clearly positive to make these relationships and distinctions. This positive structural approach is continued as the authors relate research data integrity to the research data lifecycle to produce an implementation schema. The last section is relating research data integrity to the four Rs.
Submissions of papers for the IASSIST Quarterly are always very welcome. We welcome input from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing such a presentation, give a thought to turning your one-time presentation into a lasting contribution. Doing that after the event also gives you the opportunity of improving your work after feedback. We encourage you to login or create an author profile at https://www.iassistquarterly.com (our Open Journal System application). We permit authors to have \u27deep links\u27 into the IQ as well as deposition of the paper in your local repository. Chairing a conference session or workshop with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the limited number of session participants and will be readily available on the IASSIST Quarterly website at https://www.iassistquarterly.com. Authors are very welcome to take a look at the instructions and layout:
https://www.iassistquarterly.com/index.php/iassist/about/submissions
Authors can also contact me directly via e-mail: [email protected]. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.
Karsten Boye Rasmussen - November 202
Emancipating data science for Black and Indigenous students via liberatory datasets and curricula
Despite findings highlighting the severe underrepresentation of women and minoritized groups in data science, most scholarly research has focused on new methodologies, tools, and algorithms as opposed to who data scientists are or how they learn their craft. This paper proposes that increased representation in data science can be achieved via advancing the curation of datasets and pedagogies that empower Black, Indigenous, and other minoritized people of color to enter the field. This work contributes to our understanding of the obstacles facing minoritized students in the classroom and solutions to mitigate their marginalization
Open geospatial data: A comparison of data cultures in local government
Public geospatial data (geodata) is created at all levels of government, including federal, state, and local (county and municipal). Local governments, in particular, are critical sources of geodata because they produce foundational datasets, such as parcels, road centerlines, address points, land use, and elevation. These datasets are sought after by other public agencies for aggregation into state and national frameworks, by researchers for analysis, and by cartographers to serve as base map layers. Despite the importance of this data, policies about whether it is free and open to the public vary from place to place. As a result, some regions offer hundreds of free and open datasets to the public, while their neighbors may have zero, preferring to restrict them due to privacy, economic, or legal concerns.
Minnesota relies on an approach that allows counties to choose for themselves if their geodata is free and open. By contrast, its neighboring state of Wisconsin has passed legislation requiring that specific foundational geospatial datasets created by counties must be freely available to the public. This paper compares the implications and outcomes of these diverging data cultures
Investigating teaching practices in quantitative and computational Social Sciences: A case study
Data education is gaining traction across disciplines and degree levels in higher education. Teaching data skills in the Social Sciences in today\u27s data-driven world is vital for preparing the next generation of data literate and critical social scientists. The ability to identify, assess, analyze, and communicate well and responsibly with data is key for scholars and professionals to navigate dynamic and expansive information ecosystems. This paradigm shift demands instructors to adapt their curricula and pedagogy to advance students’ computational and statistical knowledge. This paper presents some of the findings from a local report of a larger national project which explored pedagogical techniques and instructional support needs for teaching undergraduates with quantitative data in the Social Sciences. Results revealed that the core learning goal of instructors is to develop students\u27 critical thinking skills with data, including the conceptual understanding of the research methods employed in the field; the ability to critically evaluate research methodologies, findings, and data sets; and prowess using quantitative and computational tools and technologies. A recurring theme across interviews was students’ fear of math and technology and challenges these fears pose to data-related instruction. Instructors value participation in a community of practice and are eager for more institutional support to advance their computational skills. Based on these findings, we suggest avenues for academic libraries to further develop services, activities, and partnerships to aid data instruction efforts in the Social Sciences
Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires
Data Documentation Initiative-Lifecycle (DDI-L) introduced a robust metadata model to support the capture of questionnaire content and flow, and encouraged through support for versioning and provenancing, objects such as BasedOn for the reuse of existing question items. However, the dearth of questionnaire banks including both question text and response domains has meant that an ecosystem to support the development of DDI ready Computer Assisted Interviewing (CAI) tools has been limited. Archives hold the information in PDFs associated with surveys but extracting that in an efficient manner into DDI-Lifecycle is a significant challenge.
While CLOSER Discovery has been championing the provision of high-quality questionnaire metadata in DDI-Lifecycle, this has primarily been done manually. More automated methods need to be explored to ensure scalable metadata annotation and uplift.
This paper presents initial results in engineering a machine learning (ML) pipeline to automate the extraction of questions from survey questionnaires as PDFs. Using CLOSER Discovery as a ‘training and test dataset’, a number of machine learning approaches have been explored to classify parsed text from questionnaires to be output as valid DDI items for inclusion in a DDI-L compliant repository.
The developed ML pipeline adopts a continuous build and integrate approach, with processes in place to keep track of various combinations of the structured DDI-L input metadata, ML models and model parameters against the defined evaluation metrics, thus enabling reproducibility and comparative analysis of the experiments. Tangible outputs include a map of the various metadata and model parameters with the corresponding evaluation metrics’ values, which enable model tuning as well as transparent management of data and experiments