1,721,004 research outputs found

    Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response

    No full text
    The crisis image benchmark dataset consists data from several data sources such as CrisisMMD, data from AIDR and Damage Multimodal Dataset (DMD). The purpose of this work was to develop a consolidated dataset, create non-overlapping train/dev/test set and provide a benchmark results for the community. We propose new datasets for disaster type detection, and informativeness classification, and damage severity assessment. Moreover, we relabel existing publicly available datasets for new tasks. We identify exact- and near-duplicates to form non-overlapping data splits, and finally consolidate them to create larger datasets. In our extensive experiments, we benchmark several state-of-the-art deep learning models and achieve promising results. We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community. https://crisisnlp.qcri.org/crisis-image-datasets-asonam20 The labels in the dataset for different tasks are as follows: Task 1: Disaster types Earthquake Fire Flood Hurricane Landslide Not disaster Other disaster Task 2: Informativeness Informative Not informative Task 3: Humanitarian categories Affected, injured, or dead people Infrastructure and utility damage Not humanitarian Rescue volunteering or donation effort Task 4: Damage severity Little or none Mild Severe Please cite the following papers, if you use any of these resources in your research. Firoj Alam, Ferda Ofli, Muhammad Imran, Tanvirul Alam, Umair Qazi, Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response, In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020. [Bibtex] Firoj Alam, Ferda Ofli, and Muhammad Imran, CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA. [Bibtex] Hussein Mozannar, Yara Rizk, and Mariette Awad, Damage Identification in Social Media Posts using Multimodal Deep Learning, In Proc. of ISCRAM, May 2018, pp. 529–543. </ol

    CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

    No full text
    The CrisisBench dataset consists data from several different data sources such as CrisisLex (CrisisLex26, CrisisLex6), CrisisNLP, SWDM2013, ISCRAM13, Disaster Response Data (DRD), Disasters on Social Media (DSM), CrisisMMD and data from AIDR. The purpose of this work was to map the class label, remove duplicates and provide a benchmark results for the community. Class labels Informative vs not-informative: Informative Not informative Humanitarian categories Affected individual Caution and advice Displaced and evacuations Donation and volunteering Infrastructure and utilities damage Injured or dead people Missing and found people Not humanitarian Requests or needs Response efforts Sympathy and support https://crisisnlp.qcri.org/crisis_datasets_benchmarks.html https://github.com/firojalam/crisis_datasets_benchmarks Please cite the following papers if you use any of these resources in your research. Firoj Alam, Hassan Sajjad, Muhammad Imran and Ferda Ofli, CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing, In ICWSM, 2021. Firoj Alam, Ferda Ofli and Muhammad Imran. CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA. Muhammad Imran, Prasenjit Mitra, and Carlos Castillo: Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. In Proceedings of the 10th Language Resources and Evaluation Conference (LREC), pp. 1638-1643. May 2016, Portorož, Slovenia. A. Olteanu, S. Vieweg, C. Castillo. 2015. What to Expect When the Unexpected Happens: Social Media Communications Across Crises. In Proceedings of the ACM 2015 Conference on Computer Supported Cooperative Work and Social Computing (CSCW '15). ACM, Vancouver, BC, Canada. A. Olteanu, C. Castillo, F. Diaz, S. Vieweg. 2014. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM'14). AAAI Press, Ann Arbor, MI, USA. Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. Extracting Information Nuggets from Disaster-Related Messages in Social Media. In Proceedings of the 10th International Conference on Information Systems for Crisis Response and Management (ISCRAM), May 2013, Baden-Baden, Germany. Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier. Practical Extraction of Disaster-Relevant Information from Social Media. In Social Web for Disaster Management (SWDM'13) - Co-located with WWW, May 2013, Rio de Janeiro, Brazil. https://appen.com/datasets/combined- disaster-response-data/ https://data.world/crowdflower/disasters- on-social-media </ol

    HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

    No full text
    The HumAID Twitter dataset consists of several thousands of manually annotated tweets that have been collected during nineteen major natural disaster events including earthquakes, hurricanes, wildfires, and floods, which happened during 2016 to 2019 across different parts of the World. It is the largest social media dataset (~77K) for crisis informatics so far (for details please refer to our paper). The annotations consist of following humanitarian categories. Humanitarian categories Caution and advice Displaced people and evacuations Dont know cant judge Infrastructure and utility damage Injured or dead people Missing or found people Not humanitarian Other relevant information Requests or urgent needs Rescue volunteering or donation effort Sympathy and support Data format and directories =========================== The data directory contains the following three sub-directories: events/ This directory contains sub-directories for each event. In which each event directory contains tab-separated (i.e., TSV) three files, i.e., train, dev and test. Each TSV file stores ground-truth annotations for the aforementioned humanitarian categories. The data format of these files is described in detail below. event_type/ This directory contains combined event type data, we combined the training, development, and test sets of all the events that belong to the same event type. all_combined/ This directory contains the whole combined set. HumAID_ICWSM_data.jsonl: Json objects of tweets Format of the TSV files --------------------------------------------------------- Each TSV file contains the following columns, separated by a tab: tweet_id: corresponds to the actual tweet id from Twitter. tweet_text: corresponds to the tweet text. class_label: corresponds to a label assigned to a given tweet text. More details can also be found in: https://crisisnlp.qcri.org/humaid_dataset <br

    COVID-19 Infodemic Twitter dataset

    No full text
    This repository contains a dataset consisting of tweets annotated with fine-grained labels related to disinformation about COVID-19. The labels answer seven different questions that are of interests to journalists, fact-checkers, social media platforms, policymakers, and society as a whole. There are annotations for Arabic and English. To label the dataset, we prepared comprehensive annotation guidelines (https://arxiv.org/abs/2005.00033), which can help similar tasks in different domains. Moreover, we launched an annotation platform to label tweets, where anyone can contribute and help increase the size of the dataset, which we will be updating here periodically. Please also check Dataset in git repository</a

    Development of Annotated Bangla Speech Corpora

    No full text
    This dataset contains Bangla read speech corpora which can be used for phonetic research and developing speech applications. Several criteria were maintained in the corpora development process that includes considering the phonetic and prosodic features during text selection. On the other hand, a specification was maintained in the recording phase as the speaking style is a vital part in speech applications. We also concentrated on proper text normalization, pronunciation, aligning, and labeling. The labeling was done manually – in the present endeavor sentence level labeling (annotation) was completed by maintaining a specification so that it could be expanded in future

    Datasets for Social Media Image Classification for Disaster Response

    No full text
    Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response During a disaster event, images shared on social media helps crisis managers gain situational awareness and assess incurred damages, among other response tasks. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of damage. Despite several efforts, past works mainly suffer from limited resources (i.e., labeled images) available to train more robust deep learning models. In this study, we propose new datasets for disaster type detection, and informativeness classification, and damage severity assessment. Moreover, we relabel existing publicly available datasets for new tasks. We identify exact- and near-duplicates to form non-overlapping data splits, and finally consolidate them to create larger datasets. In our extensive experiments, we benchmark several state-of-the-art deep learning models and achieve promising results. We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore