1,720,965 research outputs found

    Sentiment analysis using unsupervised learning for local government elections in South Africa

    Full text link
    Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2023.Understanding public sentiment is vital for political parties in order for them to be able to structure their election campaigns around voter expectations. The study focuses on unsupervised learning to assess the variation of polarity sentiment in tweets during the 2021 South African local government election campaign. The study uses a pre-trained twitter-roberta-base-sentiment-latest model from Hugging Face and unsupervised lexicon based pre-trained approaches, namely: VADER and TextBlob to determine the polarity sentiment in order to gain insight that could be applied towards informing political campaigns and to see if there are any distinct sentiment patterns or shifts during different phases of the 2021 local government elections campaigns. Furthermore, the study applies the use of suspicious patterns and K-Means methods to classify the users as either bots and human using to be able to identify the user behind the keyboard. The study also make use of OpenAI GPT model to label the dataset for fine-tuning and addresses the issue of class imbalance. VADER and TextBlob results show a significant difference from that of the twitter-roberta-base-sentiment-latest models when comparing the statistical distribution based on the sentiment results and the user classification results. Based on the results, there is a significant variation across all sentiment classes and they vary over time. Furthermore, the results revealed TRBSL and TRBSL** outperforms VADER and TextBlob based on the scores for weighted accuracy and F1-scores. It was discovered that most of the tweets were generated by humans, with only few being identified as bot-generated and having a negative sentiments.Computer ScienceMIT (Big Data Science)UnrestrictedFaculty of Engineering, Built Environment and Information Technolog

    Visually grounded keyword detection and localisation for low-resource languages

    Full text link
    Thesis (PhD)--Stellenbosch University, 2023.ENGLISH ABSTRACT: Visually grounded speech (VGS) models are trained on images paired with unlabelled spoken captions. Such models could be used to build speech systems in settings where it is impossible to get transcribed data, e.g. for documenting unwritten languages. We investigate keyword localisation in speech—finding where in an utterance a given written keyword occurs—using VGS models trained in a real low-resource setting. Existing VGS studies fall short in two areas. Firstly, previous work has shown that VGS models can be used for tasks such as cross-modal retrieval, keyword detection and keyword spotting, but keyword localisation has not been explored. Secondly, most previous VGS studies use datasets where images are paired with speech in English (or another well-resourced language). English is therefore often used as a proxy for a low-resource language, making it difficult to accurately assess their performance in a real low-resource setting. Based on this, we address the following two overarching research questions: (i) Is keyword localisation possible with VGS models? (ii) In a real low-resource setting, can we do visually grounded keyword localisation cross-lingually? To address the first question, we augment and extend existing VGS models with the ability to not only detect, but also localise written keywords. For this research question, we constrain ourselves to the artificial low-resource setting where English VGS data is used, allowing us to compare and directly extend previous work. We use as starting point an existing methodology for training VGS models to detect keywords in speech: training images are tagged with soft textual labels using an existing offline image tagger, and these tags are then used as targets to train a speech network. I.e., the model receives a noisy target for whether words occur in an utterance, but not where or in which order. We extend this model using four localisation methods. Input masking masks the input signal at different locations and measures the difference in the output unit for a particular keyword. Attention localisation requires an attention layer that pools features over the temporal axis; we use the attention weights as localisation scores. Grad-CAM is a saliency-based method that can be applied to any convolutional neural network to determine which parts of the network input most contribute to a particular output decision. The score aggregation method uses a particular type of pooling so that the output score can be regarded as an aggregation of local scores; these can be used to select the most likely temporal location for a query keyword. In an oracle localisation test (where the model is told that a keyword is present in an utterance and then asked where it occurs), the masked-based localisation method achieves an accuracy of 57.0%, outperforming all the other approaches, with the attention-based method coming second with 46.0%. To tackle the second research question (cross-lingual keyword localisation in a real low-resource setting), we start by collecting and releasing a new VGS dataset. The Yor`ub´a Flickr Audio Caption Corpus (YFACC) dataset contains spoken captions for 6k Flickr images produced by a single speaker in Yor`ub´a: a real low-resource language spoken in Nigeria. Using this data, we consider the problem of cross-lingual keyword detection and localisation: given an English text query, we detect whether the query occurs in Yor`ub´a speech, and if it is detected, we localise where in the utterance the query occurs. To build this VGS system, images are automatically tagged with English visual labels serving as targets for an attention-based model that takes Yor`ub´a speech as input. Then we apply the attention-based localisation method to do cross-lingual keyword detection and localisation for the first time in a real low-resource setting. The cross-lingual model obtains a precision of 16.0% in actual keyword localisation which involves first detecting whether a keyword occurs before doing localisation. Although this result is modest when viewed in isolation, this is a model trained without any parallel English-Yor`ub´a data or any transcriptions. We find that the performance can be improved by initialising the cross-lingual model from a model pretrained on the English image–speech dataset, giving a result of 22.8%. In answering the two main research questions, we make the following concrete contributions: (1) We propose a new VGS model for keyword detection and keyword spotting using attention, and carry out a thorough comparison to existing VGS-based methods. (2) VGS models are extended with four localisation methods. (3) We present a detailed quantitative and qualitative analysis revealing the limits of the models above, showcasing their success and failure modes. We observe good localisation matches for some of the 67 keywords in the system’s vocabulary (black, pool, soccer, tree), while others are confused with semantically related words: ocean → surfer; ball → soccer; swimming → pool. (4) We release a new multimodal, multilingual dataset which enables VGS modelling in a real low-resource setting, resembling a language documentation scenario. The dataset extends the Flickr8k image–text dataset to include Yor`ub´a spoken captions. (5) We introduce a system for cross-lingual keyword detection and keyword localisation in a real low-resource setting using our new Yor`ub´a speech–image dataset. (6) We provide a comprehensive analysis of the cross-lingual VGS model. We observe that there are keywords with good performance, such as brown (b´ur´a`un; 100.0% precision), bike (k`e. k´e. ; 94.1%) and grass (kor´ıko 90.9%). But there are many others on which the model struggles due to poor visual grounding and confusion between semantically related concepts. In summary, we show that VGS models can be used for a limited form of keyword localisation in a real low-resource setting. We hope that our new dataset and new findings will stimulate more research in the use of VGS models for real low-resource languages.AFRIKAANSE OPSOMMING: Geen opsomming beskikbaar.Doctora

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado

    koamabayili/VECTRON-author-checklist: VECTRON author checklist

    No full text
    We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
    corecore