1,721,079 research outputs found

    Software for "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls"

    Full text link
    This repository contains the source code for reproducing the results from the paper "Who Let The Trolls Out? Towards Understanding State-Sponsored troll accounts on Twitter" (see https://arxiv.org/abs/1811.03130 for the detailed description on the results). DOI: 10.5281/zenodo.2558560 The data collected and used for this study can be found here: DOI 10.5281/zenodo.2558433 Please appropriately cite the "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls" paper in any publication, of any form and kind, using this software:  @article{zannettou2018let, title={Who let the trolls out? towards understanding state-sponsored trolls}, author={Zannettou, Savvas and Caulfield, Tristan and Setzer, William and Sirivianos, Michael and Stringhini, Gianluca and Blackburn, Jeremy}, journal={arXiv preprint arXiv:1811.03130}, year={2018} }   Acknowledgments: This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under the Marie Skłodowska-Curie ENCASE project (Grant Agreement No. 691025). </ul

    Software for "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior"

    No full text
    This repository consists of the custom external platform for the annotation process of CrowdFlower, used on the "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" paper, published in ICWSM 2018. Full text of the paper can be found here: Please cite the paper in any published work that uses any of these resources. @article{founta2018large,     title={Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior},     author={Founta, Antigoni-Maria and Djouvas, Constantinos and Chatzakou, Despoina and Leontiadis, Ilias and Blackburn, Jeremy and Stringhini, Gianluca and Vakali, Athena and Sirivianos, Michael and Kourtellis, Nicolas},     journal={arXiv preprint arXiv:1802.00393},     year={2018}  } For any further questions contact a.m.founta at gmail dot com.</p

    Restricted Dataset for "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior"

    No full text
    Restricted Dataset for the "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" paper, published in ICWSM 2018. The full text of the paper can be found here. The Public version of the dataset can be found here hatespeech_text_label_vote_RESTRICTED_100K.csv: contains ~100K raws with tweet text, the associated majority label, and the number of votes for the majority label. The tweets are shuffled so that there is no connection between tweet IDs and texts (in order to be in line with the T&C of Twitter). retweets.csv: contains ~2K rows, where every row consists of the row number in the hatespeech_text_label_vote_RESTRICTED_100K.csv file which is the first occurrence of a Tweet text followed by comma-separated row numbers of all other occurrences of the same Tweet text in the same file. There are ~8K other occurrences due to retweets. Please cite the paper in any published work that uses any of these resources. @inproceedings{founta2018large, title={Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior}, author={Founta, Antigoni-Maria and Djouvas, Constantinos and Chatzakou, Despoina and Leontiadis, Ilias and Blackburn, Jeremy and Stringhini, Gianluca and Vakali, Athena and Sirivianos, Michael and Kourtellis, Nicolas}, booktitle={11th International Conference on Web and Social Media, ICWSM 2018}, year={2018}, organization={AAAI Press} } For any further questions contact a.m.founta at gmail dot com AND markos.charalambous at eecei dot cut dot ac dot c

    Invalid Dataset for "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior"

    No full text
    This dataset is invalid. The updated version of this Dataset is here: https://zenodo.org/record/3678559#.Xl9-Ji97FhE Dataset for the "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" paper, published in ICWSM 2018. The full text of the paper can be found here. The dataset provided here includes an updated version of the original dataset, with ~100k tweets annotated using the CrowdFlower platform: hatespeech_labels.csv: contains ~100K rows, where every row consists of a unique Tweet ID and its according to majority annotation UPDATE: It has come to our understanding that a number of the tweets are not available anymore for download on Twitter. Therefore, under request, we can provide one more file with the full ~100K tweet text, their associated majority label, and the number of votes for the majority label. The tweets are shuffled so that there is no connection between tweet IDs and texts (in order to be in line with the T&C of Twitter). To obtain the file contact the authors through email. Please cite the paper in any published work that uses any of these resources. @inproceedings{founta2018large, title={Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior}, author={Founta, Antigoni-Maria and Djouvas, Constantinos and Chatzakou, Despoina and Leontiadis, Ilias and Blackburn, Jeremy and Stringhini, Gianluca and Vakali, Athena and Sirivianos, Michael and Kourtellis, Nicolas}, booktitle={11th International Conference on Web and Social Media, ICWSM 2018}, year={2018}, organization={AAAI Press} } For any further questions contact a.m.founta at gmail dot com AND markos.charalambous at eecei.cut.ac.c

    A multi-modal, multi-platform, and multi-lingual approach to understanding online misinformation

    Full text link
    Due to online social media, access to information is becoming easier and easier. Meanwhile, the truthfulness of online information is often not guaranteed. Incorrect information, often called misinformation, can have several modalities, and it can spread to multiple social media platforms in different languages, which can be destructive to society. However, academia and industry do not have automated ways to assess the impact of misinformation on social media, preventing the adoption of productive strategies to curb the prevalence of misinformation. In this dissertation, I present my research to build computational pipelines that help measuring and detecting misinformation on social media. My work can be divided into three parts. The first part focuses on processing misinformation in text form. I first show how to group political news articles from both trustworthy and untrustworthy news outlets into stories. Then I present a measurement analysis on the spread of stories to characterize how mainstream and fringe Web communities influence each other. The second part is related to analyzing image-based misinformation. It can be further divided into two parts: fauxtography and generic image misinformation. Fauxtography is a special type of image misinformation, where images are manipulated or used out-of-context. In this research, I present how to identify fauxtography on social media by using a fact-checking website (Snopes.com), and I also develop a computational pipeline to facilitate the measurement of these images at scale. I next focus on generic misinformation images related to COVID-19. During the pandemic, text misinformation has been studied in many aspects. However, very little research has covered image misinformation during the COVID-19 pandemic. In this research, I develop a technique to cluster visually similar images together, facilitating manual annotation, to make subsequent analysis possible. The last part is about the detection of misinformation in text form following a multi-language perspective. This research aims to detect textual COVID-19 related misinformation and what stances Twitter users have towards such misinformation in both English and Chinese. To achieve this goal, I experiment on several natural language processing (NLP) models to investigate their performance on misinformation detection and stance detection in both monolingual and multi-lingual manners. The results show that two models: COVID-Tweet-BERT v2 and BERTweet are generally effective in detecting misinformation and stance in the two above manners. These two models are promising to be applied to misinformation moderation on social media platforms, which heavily depends on identifying misinformation and stance of the author towards this piece of misinformation. Overall, the results of this dissertation shed light on understanding of online misinformation, and my proposed computational tools are applicable to moderation of social media, potentially benefitting for a more wholesome online ecosystem

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Dataset for "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls"

    No full text
    This is the dataset used for the study "Who Let The Trolls Out? Towards Understanding State-Sponsored Trolls". Savvas Zannettou, Tristan Caulfield, William Setzer, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn. Arxiv, 2019. DOI: 10.5281/zenodo.2558560 The dataset consists of the data released by Twitter on October 2018 for Russian and Iranian state-sponsored troll accounts, which is available at https://about.twitter.com/en_us/values/elections-integrity.html#data as well as intermediate data that we generated after processing the raw data. For instance, we include trained Word2Vec and LDA models, the output of our influence estimation experiments via Hawkes Processes, and a lot of other data necessary to reproduce the results in the paper. To use the provided data simply download the compressed file from and make sure that the uncompressed data folder is in the same directory as the IPython Notebook. The code used for this study can be found here: https://github.com/zsavvas/trolls_analysis Please cite our paper if any publication, of any form and kind results of you using this data: @article{zannettou2018let, title={Who let the trolls out? towards understanding state-sponsored trolls}, author={Zannettou, Savvas and Caulfield, Tristan and Setzer, William and Sirivianos, Michael and Stringhini, Gianluca and Blackburn, Jeremy}, journal={arXiv preprint arXiv:1811.03130}, year={2018} }</pre

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore