1,721,070 research outputs found

    One Year of COVID-19 Vaccine Misinformation on Twitter: Longitudinal Study

    Full text link
    Background: Vaccinations play a critical role in mitigating the impact of COVID-19 and other diseases. Past research has linked misinformation to increased hesitancy and lower vaccination rates. Gaps remain in our knowledge about the main drivers of vaccine misinformation on social media and effective ways to intervene. Objective: Our longitudinal study had two primary objectives: (1) to investigate the patterns of prevalence and contagion of COVID-19 vaccine misinformation on Twitter in 2021, and (2) to identify the main spreaders of vaccine misinformation. Given our initial results, we further considered the likely drivers of misinformation and its spread, providing insights for potential interventions. Methods: We collected almost 300 million English-language tweets related to COVID-19 vaccines using a list of over 80 relevant keywords over a period of 12 months. We then extracted and labeled news articles at the source level based on third-party lists of low-credibility and mainstream news sources, and measured the prevalence of different kinds of information. We also considered suspicious YouTube videos shared on Twitter. We focused our analysis of vaccine misinformation spreaders on verified and automated Twitter accounts. Results: Our findings showed a relatively low prevalence of low-credibility information compared to the entirety of mainstream news. However, the most popular low-credibility sources had reshare volumes comparable to those of many mainstream sources, and had larger volumes than those of authoritative sources such as the US Centers for Disease Control and Prevention and the World Health Organization. Throughout the year, we observed an increasing trend in the prevalence of low-credibility news about vaccines. We also observed a considerable amount of suspicious YouTube videos shared on Twitter. Tweets by a small group of approximately 800 "superspreaders" verified by Twitter accounted for approximately 35% of all reshares of misinformation on an average day, with the top superspreader (@RobertKennedyJr) responsible for over 13% of retweets. Finally, low-credibility news and suspicious YouTube videos were more likely to be shared by automated accounts. Conclusions: The wide spread of misinformation around COVID-19 vaccines on Twitter during 2021 shows that there was an audience for this type of content. Our findings are also consistent with the hypothesis that superspreaders are driven by financial incentives that allow them to profit from health misinformation. Despite high-profile cases of deplatformed misinformation superspreaders, our results show that in 2021, a few individuals still played an outsized role in the spread of low-credibility vaccine content. As a result, social media moderation efforts would be better served by focusing on reducing the online visibility of repeat spreaders of harmful content, especially during public health crises

    Graph structure in the Web - aggregated by Pay-Level Domain

    No full text
    Previous research on the overall graph structure of the World Wide Web mostly focused on the page level, meaning that the graph that directly results from hyperlinks between individual web pages was analyzed. This paper aims to provide additional insights about the macroscopic structure of the World Web Web by analyzing an aggregated version of a recent web graph. The graph covers over 3.5 billion web pages and 128 billion hyperlinks between pages. It was crawled in the first half of 2012. We aggregate this graph by pay-level domain (PLD), meaning that all pages that belong to the same pay-level domain are represented by a single node and that an arc exists between two nodes if there is at least one hyperlink between pages of the corresponding pay-level domains. The resulting PLD graph covers 43 million PLDs and contains 623 million arcs between PLDs. Analyzing this aggregated graph allows us to present findings about linkage patterns between complete websites and not only individual HTML pages. In this paper, we present basic statistics about the PLD graph, such as degree distributions, top-ranked PLDs, distances and diameter. We analyze whether the bow-tie structure introduced by Broder et al. can also be identified in our PLD graph and reveal a backbone of highly interlinked websites within the graph. We group the websites by top-level domain and report findings about the overall linkage within and between different top-level domains. In a last experiment, we use data from the Open Directory Project (DMOZ) to categorize websites by topic and report findings about linkage patterns between websites belonging to different topical categories

    Agent-based Model Selection Framework for Complex Adaptive Systems

    Full text link
    Thesis (PhD) - Indiana University, Computer Sciences, 2006Human-initiated land-use and land-cover change is the most significant single factor behind global climate change. Since climate change affects human, animal and plant populations alike, and the effects are potentially disastrous and irreversible, it is equally important to understand the reasons behind land-use decisions as it is to understand their consequences. Empirical observations and controlled experimentation are not usually feasible methods for studying this change. Therefore, scientists have resorted to computer modeling, and use other complementary approaches, such as household surveys and field experiments, to add depth to their models. The computer models are not only used in the design and evaluation of environmental programs and policies, but they can be used to educate land-owners about sustainable land management practices. Therefore, it is critical which model the decision maker trusts. Computer models can generate seemingly plausible outcomes even if the generating mechanism is quite arbitrary. On the other hand, with excess complexity the model may become incomprehensible, and proper tweaking of the parameter values may make it produce any results the decision maker would like to see. The lack of adequate tools has made it difficult to compare and choose between alternative models of land-use and land-cover change on a fair basis. Especially if the candidate models do not share a single dimension, e.g., a functional form, a criterion for selecting an appropriate model, other than its face value, i.e., how well the model behavior confirms to the decision maker's ideals, may be hard to find. Due to the nature of the class of models, existing model selection methods are not applicable either. In this dissertation I propose a pragmatic method, based on algorithmic coding theory, for selecting among alternative models of land-use and land-cover change. I demonstrate the method's adequacy using both artificial and real land-cover data in multiple experimental conditions with varying error functions and initial conditions

    Analyzing Social Big Data to Study Online Discourse and Its Manipulation

    Full text link
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2017The widespread use of social media helps people connect and share their opinions and experiences with millions of others, while simultaneously bringing new threats. This dissertation aims to provide insights into analysis of online conversations and mechanisms that might be used for their manipulation. The first part delves into the effect of geography on information dissemination and user roles in online discourse. I study trending topics on Twitter to highlight mechanisms governing the diffusion of local and national trends. My analysis points to three locally geographic regions and one cluster that contains trendsetting cities coinciding with major travel hubs. When factors limiting information spread are considered, censorship mechanisms mandated by governments are found to be ineffective and even show a correlation with increasing popularity. I also present an analysis of spatiotemporal characteristics and distinct user roles in the Gezi movement. Next, I discuss different forms of social media manipulation. Malicious entities can employ promotion campaigns and social bots. We build machine learning frameworks that exploit features extracted from network, content, and users to train accurate supervised learning models. Our system for early detection of promoted social media trends harnesses multidimensional time series signals to reveal subtle differences between promoted and organic trends. In my research on social bots, I carried out the largest study of the human-bot ecosystem to date. Our estimates suggest that between 9 and 15% of active Twitter accounts are bots. I present distinct behavioral groups and interaction strategies among human and bot accounts. This body of work contributes to a more comprehensive understanding of online user behavior and to the development of systems to detect online abuse

    Mining for topics to suggest knowledge model extensions

    Full text link
    Electronic concept maps, interlinked with other concept maps and multimedia resources, can provide rich knowledge models to capture and share human knowledge. This article presents and evaluates methods to support experts as they extend existing knowledge models, by suggesting new context-relevant topics mined from Web search engines. The task of generating topics to support knowledge model extension raises two research questions: first, how to extract topic descriptors and discriminators from concept maps; and second, how to use these topic descriptors and discriminators to identify candidate topics on the Web with the right balance of novelty and relevance. To address these questions, this article first develops the theoretical framework required for a "topic suggester" to aid information search in the context of a knowledge model under construction. It then presents and evaluates algorithms based on this framework and applied in EXTENDER, an implemented tool for topic suggestion. EXTENDER has been developed and tested within CmapTools, a widely used system for supporting knowledge modeling using concept maps. However, the generality of the algorithms makes them applicable to a broad class of knowledge modeling systems, and to Web search in general.Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Leake, David. Indiana University; Estados UnidosFil: Menczer, Filippo. Indiana University; Estados UnidosFil: Reichherzer, Thomas. University of West Florida; Estados Unido

    Algorithmic computation and approximation of semantic similarity

    Full text link
    Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata—namely topical directories—to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.Fil: Maguitman, Ana Gabriela. Indiana University; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Menczer, Filippo. Indiana University; Estados UnidosFil: Erdinc, Fulya. Indiana University; Estados UnidosFil: Roinestad, Heather. Indiana University; Estados UnidosFil: Vespignani, Alessandro. Indiana University; Estados Unido

    The Expression of Human Behavior in Online Networks

    Full text link
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2011The wide adoption of Web 2.0, in which users can interact with Web sites to generate new content, has a serendipitous side effect. All of this user-generated data provides researchers with a unique lens on the behavior of the users who created it. While instrumenting millions of users with a device that records everything they read in real life would be impossible, we can easily record the articles they read on Wikipedia. Similarly, we can use Twitter data to map the interactions between tens of thousands of people, as well as studying the topics they discuss. I outline several studies taking advantage of this trove of behavioral data. Initially focusing on Wikipedia, I examine the patterns in the paths that users take when navigating from article to article, and contrast these with similar data for several other large Internet destinations. I then develop an understanding of bursty popularity dynamics, discovering that bursts in the attention to a page have dynamics similar to that observed in natural phenomena, like earthquakes and avalanches; I also present a simple model able to capture these dynamics. Next I switch gears --- away from looking at users as they travel between topics, and towards looking at how topics (memes) travel between users, and how users interact with each other. I frame this research in the context of political discussion on Twitter. I first perform a general overview of the space of this discussion, examining how users connect with each other. I conclude with a case study, the Web site truthy.indiana.edu, which focuses on the case of the deceptive dissemination of ideas, or so-called astroturf

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore