1,720,952 research outputs found
CaptureBias: Supporting Media Scholars with Ambiguity-Aware Bias Representation for News Videos
In this project we explore the presence of ambiguity in textual and visual media and its influence on accurately understanding andcapturing bias in news. We study this topic in the context of supportingmedia scholars and social scientists in their media analysis. Our focuslies on racial and gender bias as well as framing and the comparisonof their manifestation across modalities, cultures and languages. In thispaper we lay out a human in the loop approach to investigate the role ofambiguity in detection and interpretation of bias.Accepted Author ManuscriptWeb Information System
Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study
Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%
FrameNet Semantic Frame Disambiguation with CrowdTruth
<p>This repository contains a ground truth corpus for semantic frame disambiguation, acquired with crowdsourcing and processed with <strong><a href="http://crowdtruth.org/">CrowdTruth</a></strong> metrics that capture ambiguity in annotations by measuring inter-annotator disagreement.</p>
<p>The dataset contains annotations for 433 sentence-word pairs from the <a href="https://framenet.icsi.berkeley.edu/">FrameNet corpus v.1.7</a>, with each sentence-word pair annotated for frame disambiguation by 15 workers. The crowdsourced data was collected from <a href="https://www.mturk.com/">Amazon Mechanical Turk</a>.</p>
<p>The corpus has been referenced in the following paper:</p>
<ul>
<li>Anca Dumitrache, Lora Aroyo and Chris Welty: <strong><a href="https://arxiv.org/abs/1805.00270">Capturing and Interpreting Ambiguity in Crowdsourcing Frame Disambiguation</a></strong>. <a href="https://www.humancomputation.com/2018/">HCOMP 2018</a>.</li>
</ul>
<p>To replicate the data processing from the paper, use the Jupyter Notebook file <code>CrowdTruth metrics.ipynb</code>. It requires the installation of the <a href="https://github.com/CrowdTruth/CrowdTruth-core">CrowdTruth metrics</a> Python package (v >= 2.0).</p>
<p>The data aggregated with CrowdTruth metrics is available in folder <code>data/output/</code></p>
<p>The raw crowdsourcing data is available in folder <code>data/input/</code></p>
<p>If you find this data useful in your research, please consider citing:</p>
<pre><code>@inproceedings{dumitrache2018frames,
Author = {Anca Dumitrache and Lora Aroyo and Chris Welty},
Title = {Capturing Ambiguity in Crowdsourcing Frame Disambiguation},
Booktitle = {The sixth AAAI Conference on Human Computation and Crowdsourcing},
Year = {2018}
}
</code></pre>
Characterising and Mitigating Aggregation-Bias in Crowdsourced Toxicity Annotations
Training machine learning (ML) models for natural language processing usually requires large amount of data, often acquired through crowdsourcing. The way this data is collected and aggregated can have an effect on the outputs of the trained model such as ignoring the labels which differ from the majority. In this paper we investigate how label aggregation can bias the ML results towards certain data samples and propose a methodology to highlight and mitigate this bias. Although our work is applicable to any kind of label aggregation for data subject to multiple interpretations, we focus on the effects of the bias introduced by majority voting on toxicity prediction over sentences. Our preliminary results point out that we can mitigate the majority-bias and get increased prediction accuracy for the minority opinions if we take into account the different labels from annotators when training adapted models, rather than rely on the aggregated labels.Accepted Author ManuscriptWeb Information System
Impact of Algorithmic Decision Making on Human Behavior: Evidence from Ultimatum Bargaining
Recent advances in machine learning have led to the widespread adoption of ML models for decision support systems. However, little is known about how the introduction of such systems affects the behavior of human stakeholders. This pertains both to the people using the system, as well as those who are affected by its decisions. To address this knowledge gap, we present a series of ultimatum bargaining game experiments comprising 1178 participants. We find that users are willing to use a black-box decision support system and thereby make better decisions. This translates into higher levels of cooperation and better market outcomes. However, because users under-weigh algorithmic advice, market outcomes remain far from optimal. Explanations increase the number of unique system inquiries, but users appear less willing to follow the system’s recommendation. People who negotiate with a user who has a decision support system, but cannot use one themselves, react to its introduction by demanding a better deal for themselves, thereby decreasing overall cooperation levels. This effect is largely driven by the percentage of participants who perceive the system’s availability as unfair. Interpretability mitigates perceptions of unfairness. Our findings highlight the potential for decision support systems to further human cooperation, but also the need for regulators to consider heterogeneous stakeholder reactions. In particular, higher levels of transparency might inadvertently hurt cooperation through changes in fairness perceptions
Trainbot: A Conversational Interface to Train Crowd Workers for Delivering On-Demand Therapy
On-demand emotional support is an expensive and elusive societal need that is exacerbated in difficult times — as witnessed during the COVID-19 pandemic. Prior work in affective crowdsourcing has examined ways to overcome technical challenges for providing on-demand emotional support to end users. This can be achieved by training crowd workers to provide thoughtful and engaging on-demand emotional support. Inspired by recent advances in conversational user interface research, we investigate the efficacy of a conversational user interface for training workers to deliver psychological support to users in need. To this end, we conducted a between-subjects experimental study on Prolific, wherein a group of workers (N=200) received training on motivational interviewing via either a conversational interface or a conventional web interface. Our results indicate that training workers in a conversational interface yields both better worker performance and improves their user experience in on-demand stress management tasks
A layered approach towards domain authoring support
This paper presents an approach to authoring support for Web courseware based on a layered ontological paradigm. The ontology-based layers in the courseware authoring architecture serve as a basis for formal semantics and reasoning support in performing generic authoring tasks. This approach represents an extension of our knowledge classification and indexing mechanism from a previously developed system, AIMS, aimed at supporting students while completing learning tasks in a Web-based learning/training environment. We propose the addition of two vertical layers in the system architecture, Author assisting layer and Operational layer, with the role of facilitating the creation of the ontological layers (Course ontology and Domain ontology) and of the educational metadata layer. Here we focus on the domain ontology creation process, together with the support that the additional layers can provide within this process. We exemplify our method by presenting a set of generic tasks related to concept-based domain authoring and their ontological support
A Human in the Loop Approach to Capture Bias and Support Media Scientists in News Video Analysis
Bias is inevitable and inherent in any form of communication. News often appear biased to citizens with different political orientations, and understood differently by news media scholars and the broader public. In this paper we advocate the need for accurate methods for bias identification in video news item, to enable rich analytics capabilities in order to assist humanities media scholars and social political scientists. We propose to analyze biases that are typical in video news (including framing, gender and racial biases) by means of a human-in-the-loop approach that combines text and image analysis with human computation techniques
Crowdsourcing Topical Relevance with CrowdTruth
<p>This repository contains the crowdsourcing annotations for topical relevance referenced in the following paper:</p>
<ul>
<li>Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas and Lora Aroyo: Studying Topical Relevance with Evidence-based Crowdsourcing. CIKM 2018.</li>
</ul>
<p> </p>
<p>If you find this data useful in your research, please consider citing:</p>
<pre>@inproceedings{inel2018studying,
title={Studying Topical Relevance with Evidence-based Crowdsourcing},
author={Inel, Oana and Haralabopoulos, Giannis and Li, Dan and Van Gysel, Christophe and Szl{\'a}vik, Zolt{\'a}n and Simperl, Elena and Kanoulas, Evangelos and Aroyo, Lora},
booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
pages={1253--1262},
year={2018},
organization={ACM}
}</pre>
<p> </p>
<p><strong>Running the notebooks</strong></p>
<p>To run and regenerate the results, you need to install the stable version of the <em><strong>crowdtruth==2.0</strong></em> package from PyPI using:<br>
pip install crowdtruth==2.0<br>
</p>
- …
