1,721,043 research outputs found

    In Crowd Veritas: Leveraging Human Intelligence To Fight Misinformation

    Full text link
    The spread of online misinformation has important effects on the stability of democracy. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Truthfulness judgments are usually made by experts, like journalists for political statements. A different approach can be relying on a (non-expert) crowd of human judges to perform fact-checking. This leads to the following research question: can such human judges detect and objectively categorize online (mis)information? Several extensive studies based on crowdsourcing are performed to answer. Thousands of truthfulness judgments over two datasets are collected by recruiting a crowd of workers from crowdsourcing platforms and the expert judgments are compared with the crowd ones. The results obtained allow for concluding that the workers are indeed able to do such. There is a limited understanding of factors that influence worker participation in longitudinal studies across different crowdsourcing marketplaces. A large-scale survey aimed at understanding how these studies are performed using crowdsourcing is run across multiple platforms. The answers collected are analyzed from both a quantitative and a qualitative point of view. A list of recommendations for task requesters to conduct these studies effectively is provided together with a list of best practices for crowdsourcing platforms. Truthfulness is a subtle matter: statements can be just biased, imprecise, wrong, etc. and a unidimensional truth scale cannot account for such differences. The crowd workers are asked to judge seven different dimensions of truthfulness selected based on existing literature. The newly collected crowdsourced judgments show that the workers are indeed reliable when compared to an expert-provided gold standard. Cognitive biases are human processes that often help minimize the cost of making mistakes but keep assessors away from an objective judgment of information. A review of the cognitive biases which might manifest during the fact-checking process is presented together with a list of countermeasures that can be adopted. An exploratory study on the previously collected data set is thus performed. The findings are used to formulate hypotheses concerning which individual characteristics of statements or judges and what cognitive biases may affect crowd workers' truthfulness judgments. The findings suggest that crowd workers' degree of belief in science has an impact, that they generally overestimate truthfulness, and that their judgments are indeed affected by various cognitive biases. Automated fact-checking systems to combat misinformation spreading exist, however, their complexity usually makes them opaque to the end user, making it difficult to foster trust in the system. The E-BART model is introduced with the hope of making progress on this front. E-BART can provide a truthfulness prediction for a statement, and jointly generate a human-readable explanation. An extensive human evaluation of the impact of explanations generated by the model is conducted, showing that the explanations increase the human ability to spot misinformation. The whole set of data collected and analyzed in this thesis is publicly released to the research community at: https://doi.org/10.17605/OSF.IO/JR6VC.The spread of online misinformation has important effects on the stability of democracy. The information that is consumed every day influences human decision-making processes. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Indeed, fact-checking is a complex process that involves several activities. A long-term goal can be building a so-called human-in-the-loop system to cope with (mis)information by measuring truthfulness in real-time (e.g., as they appear on some social media, news outlets, and so on) using a combination of crowd-powered data, human intelligence, and machine learning techniques. In recent years, crowdsourcing has become a popular method for collecting to collect reliable truthfulness judgments in order to scale up and help study the manual fact-checking effort. Initially, this thesis investigates whether human judges can detect and objectively categorize online (mis)information and which is the environment that allows obtaining the best results. Then, the impact of cognitive biases on human assessors while judging information truthfulness is addressed. A categorization of cognitive biases is proposed together with countermeasures to combat their effects and a bias-aware judgment pipeline for fact-checking. Lastly, an approach able to predict information truthfulness and, at the same time, generate a natural language explanation supporting the prediction itself is proposed. The machine-generated explanations are evaluated to understand whether they are useful for the human assessors to better judge the truthfulness of information items. A collaborative process between systems, crowd workers, and expert fact checkers would provide a scalable and decentralized hybrid mechanism to cope with the increasing volume of online misinformation

    Crowdsourcing Peer Review in the Digital Humanities?

    Full text link
    We propose an alternative approach to the standard peer review activity that aims to exploit the otherwise lost opinions of readers of publications which is called Readersourcing, originally proposed by Mizzaro [1]. Such an approach can be formalized by means of different models which share the same general principles. These models should be able to define a way, to measure the overall quality of a publication as well the reputation of a reader as an assessor; moreover, from these measures it should be possible to derive the reputation of a scholar as an author. We describe an ecosystem called Readersourcing 2.0 which provides an implementation for two Readersourcing models [2, 3] by outlining its goals and requirements. Readersourcing 2.0 will be used in the future to gather fresh data to analyze and validate

    Crowdsourcing Peer Review: As We May Do

    No full text
    This paper describes Readersourcing 2.0, an ecosystem providing an implementation of the Readersourcing approach. Readersourcing is proposed as an alternative to the standard peer review activity that aims to exploit the otherwise lost opinions of readers. To achieve this, Readersourcing 2.0 implements two different models based on the so-called codetermination algorithms. We describe the requirements, present the overall architecture, and show how the end-user can interact with the system. Readersourcing 2.0 will be used in the future to study also other topics, like the idea of sheperding the users to achieve a better quality of the reviews and the differences between a review activity carried out with a single-blind or a double-blind approach.Preprint for the IRCDL 2019 conferenc

    Reproduce and Improve: An Evolutionary Approach to Select a Few Good Topics for Information Retrieval Evaluation

    No full text
    Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches

    Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking

    Full text link
    Assessing the truthfulness of information is a critical task in fact-checking, and is typically performed using binary or coarse ordinal scales (2-6 levels), though fine-grained scales (e.g., 100 levels) have also been explored. Magnitude Estimation (ME) takes this approach further by allowing assessors to assign any value in the range (0, + ∞). However, it introduces challenges, including the need for aggregation of assessments from individuals with different interpretations of the scale. Despite these, its successful applications in other domains suggest its potential suitability for truthfulness assessment. We conduct a crowdsourcing study by collecting assessments on claims sourced from the PolitiFact fact-checking organization using ME. To the best of our knowledge, this is the first systematic investigation of ME in the context of truthfulness assessment. Our results show that while aggregation methods significantly impact assessment quality, optimal aggregation strategies yield accuracy and reliability comparable to traditional scales. More importantly, ME allows capturing subtle differences in truthfulness, offering richer insights than conventional coarse-grained scales

    Longitudinal Loyalty: Understanding The Barriers To Running Longitudinal Studies On Crowdsourcing Platforms

    Full text link
    Crowdsourcing tasks have been widely used to collect a large number of human labels at scale. While some of these tasks are deployed by requesters and performed only once by crowd workers, others require the same worker to perform the same task or a variant of it more than once, thus participating in a so-called longitudinal study. Despite the prevalence of longitudinal studies in crowdsourcing, there is a limited understanding of factors that influence worker participation in them across different crowdsourcing marketplaces. We present results from a large-scale survey of 300 workers on 3 different micro-task crowdsourcing platforms: Amazon Mechanical Turk, Prolific and Toloka. The aim is to understand how longitudinal studies are performed using crowdsourcing. We collect answers about 547 experiences and we analyze them both quantitatively and qualitatively. We synthesize 17 take-home messages about longitudinal studies together with 8 recommendations for task requesters and 5 best practices for crowdsourcing platforms to adequately conduct and support such kinds of studies. We release the survey and the data at: https://osf.io/h4du9/

    Crowdsourcing Statement Classification to Enhance Information Quality Prediction

    No full text
    This paper explores the use of crowdsourcing to classify statement types in film reviews to assess their information quality. Employing the Argument Type Identification Procedure which uses the Periodic Table of Arguments to categorize arguments, the study aims to connect statement types to the overall argument strength and information reliability. Focusing on non-expert annotators in a crowdsourcing environment, the research assesses their reliability based on various factors including language proficiency and annotation experience. Results indicate the importance of careful annotator selection and training to achieve high inter-annotator agreement and highlight challenges in crowdsourcing statement classification for information quality assessment

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore