1,720,990 research outputs found

    Online discussions through the lens of interaction patterns

    Full text link
    Computer-mediated communication is arguably prevailing over face-to-face. However, many of the subtleties that make in-person communication personal, cues such as an ironic tone of voice or an effortless posture, are inherently impossible to render through a screen. The context vanishes from the conversation - what is left is therefore mostly text, enlivened by occasional multimedia. At least, this seems the dominant opinion of both industry and academia, that recently focused considerable resources on a deeper understanding of natural and visual language. I argue instead that richer cues are missing from online interaction only because current applications do not acknowledge them -- indeed, communication online is already infused with nonverbal codes, and the effort needed to leverage them is well worth the amount of information they carry. This dissertation therefore focuses on what is left out of the traditional definition of content: I refer to these aspects of communication as content-agnostic. Specifically, this dissertation makes three contributions. First, I formalize what constitutes content-agnostic information in computer-mediated communication, and prove content-agnostic information is as personal to each user as its offline counterpart. For this reason, I choose as a venue of research the web forum, a supposedly text-based, impersonal communication environment, and show that it is possible to attribute a message to the corresponding author solely on the basis of its content-agnostic features -- in other words, without looking at the content of the message at all. Next, I display how abundant and how varied is the content-agnostic information that lies untapped in current applications.To this end, I analyze the content-agnostic aspects of one type of interaction, the quote, and draw conclusions on how these may support discussion, signal user status, mark relationships between users, and characterize the discussion forum as a community. One interesting implication is that discussion platforms may not need to introduce new features for supporting social signals, and conversely social networks may better integrate discussion by enhancing its content-agnostic qualities. Finally, I demonstrate how content-agnostic information reveals user behavior. I focus specifically on trolls, malicious users that disrupt communities through deceptive or manipulative actions. In fact, the language of trolls blends in with that of civil users in heated discussions, which makes collecting irrefutable evidence of trolling difficult even for human moderators. Nonetheless, I show that a combination of content-agnostic and linguistic features sets apart discussions that will eventually be trolled, and reactions to trolling posts. This provides evidence of how content-agnostic information can offer a point of view on user behavior that is at the same time different from, and complementary to, that offered by the actual content of the contribution. Popular up and coming platforms, such as Snapchat, Tumblr, or Yik Yak, are increasingly abandoning persistent, threaded, text-based discussion, in favor of ephemeral, loosely structured, mixed-media content. Although the results of this dissertation are mostly drawn from discussion forums, its research frame and methods should apply directly to these other venues, and to a broad range of communication paradigms. Also, this is but a preliminary step towards a fuller understanding of what additional cues can or should complement content to overcome the limitations of computer-mediated communication

    From Volunteerism to Corporatization: Analyzing Participation in the 2015 and 2023 Reddit Blackouts

    No full text
    Reddit, one of the largest global social media platforms, has undergone significant transformations since its inception in 2005. From a loosely structured, niche platform to a globally recognized company with a standardized and regulated governance system, Reddit’s evolution has been marked by a shift in the power dynamics between its owners, moderators, and users. 2015 and 2023 were marked by the occurrence of two prominent protests, termed “blackouts.” Moderators of numerous subreddits, though not all, disabled public access to their subreddits, thereby protesting the company’s policies and policy changes and challenging the company’s endeavors to exert further control over the platform. Drawing on Bourdieusian theory and relational methodology, we establish a computational social science approach to investigate the structural causes behind the two blackouts and contextualize the differences between them. We argue that these blackouts signify growing tensions within the socio-technical space of Reddit and an ongoing political, cultural, and economic reconfiguration of its power structure and political economy

    Analyzing Support for U.S. Presidential Candidates in Twitter Polls

    No full text
    Polls posted on social media can provide information about public opinion on a variety of issues from business decisions to support for presidential election candidates. However, it is largely unknown whether the information provided by social polls is useful or not. To enhance our understanding of social polls, we examine nearly two thousand Twitter polls gauging support for U.S. presidential candidates during the 2016 and 2020 election campaigns. First, we describe the prevalence of social polls. Second, we characterize social polls in terms of the engagement they elicit and the response options they present. Third, leveraging machine learning models, we infer and describe several characteristics, including demographics and political leanings, of the users who author and interact with social polls. Finally, we study the relationship between social poll results, their attributes, and the characteristics of users interacting with them. Our findings suggest how and to what extent polling on Twitter is biased in terms of content, authorship, and audience. The 2016 and 2020 polls were predominantly crafted by older males and manifested a pronounced bias favoring candidate Donald Trump, whereas traditional surveys favored Democratic candidates. We further identify and explore the potential reasons for such biases and discuss their repercussions

    How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?

    No full text
    As NLP models are increasingly deployed in socially situated settings such as online abusive content detection, it is crucial to ensure that these models are robust. One way of improving model robustness is to generate counterfactually augmented data (CAD) for training models that can better learn to distinguish between core features and data artifacts. While models trained on this type of data have shown promising out-of-domain generalizability, it is still unclear what the sources of such improvements are. We investigate the benefits of CAD for social NLP models by focusing on three social computing constructs — sentiment, sexism, and hate speech. Assessing the performance of models trained with and without CAD across different types of datasets, we find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain. We unpack this apparent discrepancy using machine explanations and find that CAD reduces model reliance on spurious features. Leveraging a novel typology of CAD to analyze their relationship with model performance, we find that CAD which acts on the construct directly or a diverse set of CAD leads to higher performance
    corecore