South African Tuberculosis Vaccine Initiative
UCT Computer Science Research Document ArchiveNot a member yet
1270 research outputs found
Sort by
ReproHum #0866-04: Another Evaluation of Readers’ Reactions to News Headlines
The reproduction of Natural Language Processing (NLP) studies is important in establishing their reliability. Nonetheless, many papers in NLP have never been reproduced. This paper presents a reproduction of Gabriel et al. (2022)’s work to establish the extent to which their findings, pertaining to the utility of large language models (T5 and GPT2) to automatically generate writer’s intents when given headlines to curb misinformation, can be confirmed. Our results show no evidence to support two of their four findings and they partially support the rest of the original findings. Specifically, while we confirmed that all the models are judged to be capable of influencing readers’ trust or distrust, there was a difference in T5’s capability to reduce trust. Our results show that its generations are more likely to have greater influence in reducing trust while Gabriel et al. (2022) found more cases where they had no impact at all. In addition, most of the model generations are considered socially acceptable only if we relax the criteria for determining a majority to mean more than chance rather than the apparent > 70% of the original study. Overall, while they found that “machine-generated MRF implications alongside news headlines to readers can increase their trust in real news while decreasing their trust in misinformation”, we found that they are more likely to decrease trust in both cases vs. having no impact at all
Redzone stream compaction: removing k items from a list in parallel O(k) time
Stream compaction, the parallel removal of selected items from a list, is a fundamental building block in parallel algorithms. It is extensively used, both in computer graphics, for shading, collision detection, and ray tracing, as well as in general computing, such as for tree traversal and database selection. In this paper we present Redzone stream compaction, the first parallel stream compaction algorithm to remove k items from a list with n ≥ k elements in O(k) rather than O(n) time. Based on our benchmark experiments on both GPU and CPU, if k is proportionally small (k ≪ n) Redzone outperforms existing parallel stream compaction by orders of magnitude, while if k is close to n it underperforms by a constant factor. Redzone removes items in-place and needs only O(1) auxiliary space. However, unlike current O(n) algorithms, it is unstable (i.e., the order of elements is not preserved) and it needs a list of the items to be removed
Automating Robot Design with Multi-Level Evolution
In evolutionary robotics, Multi-Level Evolution (MLE) has been demonstrated for effective robot designs using a bottom-up approach, first evolving which materials to use for modular components and then how these components are connected into a functional robot design. This paper evaluates MLE robotic design, as an evolutionary design method on various task (robot ambulation) environments in comparison to human designed robots (pre-designed robot controller-morphology couplings). Results indicate that the MLE method evolves robots that are effective across increasingly difficult (locomotion) task
environments, out-performing pre-designed robots, and thus provide further support for the efficacy of MLE as an evolutionary robotic design method. Furthermore, results indicate the MLE method enables the evolution of suitable robotic designs for various environments, where such designs would be non-intuitive and unlikely in conventional robotic design
New tools for Automated Particle Deagglomeration: Machine-Learning from Mineralogy Data
Rock classification depends on mineral composition and morphology, such as size, angularity, or mineral associations. Traditionally, optical petrography by skilled and experienced professionals was used for this purpose. Many tools have been developed to provide data on bulk mineralogy, such as X-Ray Fluorescence (XRF), Short-Wave Infrared (SWIR), and Fourier Transform InfraRed (FTIR). However, automated mineralogy provides insightful mineralogy and textural information, but is limited in its ability to recognize individual particles in a granular specimen, as it relies on programmatic and rules-based methods to deagglomerate particles, defined as a mineral area surrounded by background phases. This is especially noticeable for fine particle-size specimens, where traditional deagglomeration techniques are limited in recognizing an irregularly shaped particle compared to multiple touching particles, which a trained analyst could recognize. We
describe a new automated mineralogy computational tool for particle classification and analysis, leveraging the general classification capacity of large neural networks (deep-learning), multi-label classification, and established computer-vision (machine-learning) techniques to improve particle deagglomeration across various granular specimens
IsiXhosa.click: online, open, user-friendly, and searchable isiXhosa-English dictionary software
IsiXhosa.click is an open-source, online, easy-to-use isiXhosa-English dictionary. It supports typo-tolerant live search, allowing users to find words quickly by typing their first few letters. Word pages provide example sentences, related words, and linguistic information. The software allows users to submit corrections and new vocabulary, which are published after review. IsiXhosa.click is a community-driven, crowd-sourced dictionary project that still enforces quality standards. The website’s database and source code are freely available under open-source licenses
Milk Matters 4.0: Bridging Milk Donor, Staff and Student Needs towards a Purposeful and Maintainable System
Milk Matters is a Cape Town based non-profit milk bank. Their primary role is to collect expressed breastmilk from donor mothers, pasteurize it and distribute it to recipient infants in need. This dissertation explores the design and deployment of a donor-facing mobile application and staff-facing web application developed with and for the non-profit organisation (NPO) over the course of postgraduate student projects from 2016 to 2023. A particular focus is on the effects that the communication and feedback provided by the application has on donors’ motivation to donate breastmilk. The staff-facing web application allows staff to manage the dynamic content of the mobile application. Additionally, we ask questions about the challenges associated with university-NPO collaborations on mobile development and reflect on design for this context. Technical and procedural challenges faced when getting the mobile application into a deployable state were noted. A pilot study was performed with three donors, followed by a deployment evaluation phase with seven donors and two NPO staff. Qualitative evaluation was done through semi-structured interviews and quantitative data was collected through usage analytics. The mobile application has shown the ability to increase donors’ motivation to donate through increased communication between the NPO and its donors and result in donors feeling more appreciated. This occurred through direct communication from within the Donor App, automatic in-app feedback and passive app content. The extent to which donors engage with the mobile application and benefit from it, depends on their personal reasoning for becoming a donor. Donors’ usage of the application also results in operational benefits for the milk bank. Challenges encountered in the deployment and maintenance of university-led mobile application development for this low-resourced NPO, highlight the effort required to sustain mobile applications in the app stores. To reduce barriers to future project continuity, recommendations include a clear handover of access to all project related accounts to the project supervisor, secure online access to all project related information and planning for continued contact with outgoing students
Investigating Markov Model Accuracy in Representing Student Programming Behaviours
Problem-solving skills are an integral component within the computer science field. Due to the diversity brought about by students following different learning and programming behaviours, it is challenging to track and identify when students get overwhelmed while writing programs. When students are overwhelmed, they are unable to complete learning objectives on time and follow prescribed pathways, depriving them of the opportunity to learn new concepts. In this paper, we developed and evaluated the quality of Markov models that encode student programming behaviours based on the evolution of source code submissions during formative practical assignments. In doing so, we use Abstract Syntax Trees (ASTs) extracted from the source code, which are used for clustering similar submissions and tracking students’ progressive approaches within the Markov models. An approach based on MinHashLSH is presented that works on AST nodes as input to emphasise structural similarity and related programming approaches. As such, the effectiveness of the Modified MinHashLSH approach is based on the clusters that make up the Markov model.
The research result shows that we can successfully create a high-quality model based on previous data. This model result could be used to inform the development of learning interventions that would move students from their stuck states
SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation
Subword regularized models leverage multiple subword tokenizations of one target sentence during training. Previous decoding algorithms select one tokenization during inference, leading to the underutilization of knowledge learned about multiple tokenizations. To address this, we propose the SubMerge algorithm to rescue the ignored Subword tokenizations through Merging equivalent ones during inference. SubMerge is a nested search algorithm where the outer beam search treats words as the minimal units, and the inner beam search provides a list of word candidates and their probabilities by merging subword tokenizations that form the same word. Experimental results on six machine translation datasets show more accurate word probability estimation and higher translation quality using SubMerge than beam search. Additionally, we provide time complexity analysis and investigate the effect of different beam sizes, training set sizes, dropout rates, and whether it is effective on non-regularized models
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Most data-to-text datasets are for English, so the difficulties of modelling data-to-text for low-resource languages are largely unexplored. In this paper we tackle data-to-text for isiXhosa, which is low-resource and agglutinative. We introduce Triples-to-isiXhosa (T2X), a new dataset based on a subset of WebNLG, which presents a new linguistic context that shifts modelling demands to subword-driven techniques. We also develop an evaluation framework for T2X that measures how accurately generated text describes the data. This enables future users of T2X to go beyond surface-level metrics in evaluation. On the modelling side we explore two classes of methods --- dedicated data-to-text models trained from scratch and pretrained language models (PLMs). We propose a new dedicated architecture aimed at agglutinative data-to-text, the Subword Segmental Pointer Generator (SSPG). It jointly learns to segment words and copy entities, and outperforms existing dedicated models for 2 agglutinative languages (isiXhosa and Finnish). We investigate pretrained solutions for T2X, which reveals that standard PLMs come up short. Fine-tuning machine translation models emerges as the best method overall. These findings underscore the distinct challenge presented by T2X: neither well-established data-to-text architectures nor customary pretrained methodologies prove optimal. We conclude with a qualitative analysis of generation errors and an ablation study
NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages
The Nguni languages have over 20 million home language speakers in South Africa. There has been considerable growth in the datasets for Nguni languages, but so far no analysis of the performance of NLP models for these languages has been reported across languages and tasks. In this paper we study pretrained language models for the 4 Nguni languages - isiXhosa, isiZulu, isiNdebele, and Siswati. We compile publicly available datasets for natural language understanding and generation, spanning 6 tasks and 11 datasets. This benchmark, which we call NGLUEni, is the first centralised evaluation suite for the Nguni languages, allowing us to systematically evaluate the Nguni-language capabilities of pretrained language models (PLMs). Besides evaluating existing PLMs, we develop new PLMs for the Nguni languages through multilingual adaptive finetuning. Our models, Nguni-XLMR and Nguni-ByT5, outperform their base models and large-scale adapted models, showing that performance gains are obtainable through limited language group-based adaptation. We also perform experiments on cross-lingual transfer and machine translation. Our models achieve notable cross-lingual transfer improvements in the lower resourced Nguni languages (isiNdebele and Siswati). To facilitate future use of NGLUEni as a standardised evaluation suite for the Nguni languages, we create a web portal to access the collection of datasets and publicly release our models