1,720,960 research outputs found

    Part of speech (POS) tagging in Roman Urdu: datasets and models

    No full text
    Roman Urdu is a prevalent medium of expression on social media, news websites, and text messages in the subcontinent, making it a valuable data source for social media and text analytics, particularly in the Indo-Pak perspective. However, despite the immense potential, limited efforts have been made in the area of Roman Urdu text analytics due to various complexities, such as a lack of a standard lexicon, the informal nature of the text, and the lack of text processing tools. The development of the Roman Urdu Part-of-Speech (POS) dataset and the implementation of a robust tagger hold immense importance for text analytics in Roman Urdu. In this work, we created a comprehensive, large-scale Roman Urdu POS dataset and developed a Roman Urdu POS tagger, laying the foundation for future advancements in advanced text analysis. Our approach involved the utilization of Hidden Markov Models, Neural Networks, state-of-the-art transformer models, and Large Language Models as baselines. In our work, we curated two distinct test datasets: one with lexical variation and the other without such variation. This approach allowed us to test the model’s robustness in handling different linguistic challenges posed by lexical variations. Our tagger yields high-quality output with an accuracy score of 96% without lexical variation and 86% on test data with lexical variations. We also evaluated state-of-the-art Large Language Models (GPT-4o and Llama-3-8B) in zero-shot and few-shot settings, with GPT-4o achieving up to 53.78% accuracy in the few-shot configuration, demonstrating a substantial performance gap compared to specialized models. This work establishes a comprehensive framework for Roman Urdu POS tagging that effectively addresses lexical variation challenges, providing essential resources and benchmarks for advancing Roman Urdu natural language processing research

    From uncertainty to trust: kernel dropout for AI-powered medical predictions

    No full text
    AI-driven medical predictions with trustworthy confidence are essential for ensuring the responsible use of AI in healthcare applications. The growing capabilities of AI raise questions about their trustworthiness in healthcare, particularly due to opaque decision-making and limited data availability. This paper proposes a novel approach to address these challenges, introducing a Bayesian Monte Carlo Dropout model with kernel modelling. Our model is designed to enhance reliability on small medical datasets, a crucial barrier to the wider adoption of AI in healthcare. This model leverages existing language models for improved effectiveness and seamlessly integrates with current workflows. Extensive evaluations of public medical datasets showcase our model's superior performance across diverse tasks. We demonstrate significant improvements in reliability, even with limited data, offering a promising step towards building trust in AI-driven medical predictions and unlocking its potential to improve patient care

    BAKER: Bayesian kernel uncertainty in domain-specific document modelling

    No full text
    In critical domains such as healthcare and law, accurately modelling the uncertainty of automatic computational models is essential. For instance, healthcare models must produce reliable estimates to guide human decision-making. However, modelling uncertainty remains challenging, particularly for models handling low-resource datasets and complex, domain-specific vocabulary. Most existing predictive models model point estimates rather than probability distributions, limiting our ability to quantify model uncertainty. This paper introduces a novel model, BAKER, designed to address these limitations. BAKER combines the strengths of Bayesian inference, known for its effectiveness in modelling uncertainty, and kernel methods, which excel at capturing complex data relationships. Incorporating kernel functions enhances model performance, particularly by reducing overfitting in data-limited scenarios. Our experimental analysis shows that BAKER significantly improves uncertainty reasoning compared to existing models.</p

    Uncertainty modelling in under-represented languages with Bayesian deep Gaussian processes

    No full text
    NLP models often face challenges with underrepresented languages due to a lack of sufficient training data and language complexities. This can result in inaccurate predictions and a failure to capture the inherent uncertainties within these languages. This paper introduces a new method for modelling uncertainty in under-represented languages by employing deep Bayesian Gaussian Processes. We develop a novel framework that integrates prior knowledge and leverages kernel functions. This helps enable the quantification of uncertainty in predictions to overcome the data limitations in under-represented languages. The efficacy of our approach is validated through various experiments, and the results are benchmarked against existing methods to highlight the enhancements in prediction accuracy and measurement of uncertainty.</p

    Would you trust an AI doctor? Building reliable medical predictions with kernel dropout uncertainty

    No full text
    The growing capabilities of AI raise questions about their trustworthiness in healthcare, particularly due to opaque decision-making and limited data availability. This paper proposes a novel approach to address these challenges, introducing a Bayesian Monte Carlo Dropout model with kernel modelling. Our model is designed to enhance reliability on small medical datasets, a crucial barrier to the wider adoption of AI in healthcare. This model leverages existing language models for improved effectiveness and seamlessly integrates with current workflows. We demonstrate significant improvements in reliability, even with limited data, offering a promising step towards building trust in AI-driven medical predictions and unlocking its potential to improve patient care.</p

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore