1,721,056 research outputs found

    Seller-buyer networks in NFT art are driven by preferential ties

    Full text link
    Non-Fungible Tokens (NFTs) have recently surged to mainstream attention by allowing the exchange of digital assets via blockchains. NFTs have also been adopted by artists to sell digital art. One of the promises of NFTs is broadening participation to the art market, a traditionally closed and opaque system, to sustain a wider and more diverse set of artists and collectors. A key sign of this effect would be the disappearance or at least reduction in importance of seller-buyer preferential ties, whereby the success of an artist is strongly dependent on the patronage of a single collector. We investigate NFT art seller-buyer networks considering several galleries and a large set of nearly 40,000 sales for over 230 M USD in total volume. We find that NFT art is a highly concentrated market driven by few successful sellers and even fewer systematic buyers. High concentration is present in both the number of sales and, even more strongly, in their priced volume. Furthermore, we show that, while a broader-participation market was present in the early phase of NFT art adoption, preferential ties have dominated during market growth, peak and recent decline. We consistently find that the top buyer accounts on average for over 80% of buys for a given seller. Similar trends apply to buyers and their top seller. We conclude that NFT art constitutes, at the present, a highly concentrated market driven by preferential seller-buyer ties

    Transfer learning for historical corpora: An assessment on post-OCR correction and named entity recognition

    Full text link
    Transfer learning in Natural Language Processing, mainly in the form of pre-trained language models, has recently delivered substantial gains across a range of tasks. Scholars and practitioners working with OCRed historical corpora are thus increasingly exploring the use of pre-trained language models. Nevertheless, the specific challenges posed by historical documents, including OCR quality and linguistic change, call for a critical assessment of the use of pre-trained language models in this setting. We consider two shared tasks, ICDAR2019 (post-OCR correction) and CLEF-HIPE-2020 (Named Entity Recognition, NER), and systematically assess using pre-trained language models with data in French, German and English. We find that using pre-trained language models helps with NER but less so with post-OCR correction. Pre-trained language models should therefore be used critically when working with OCRed historical corpora. We release our code base, in order to allow replicating our results and testing other pre-trained representations

    COVID-19 research in Wikipedia

    Full text link
    Wikipedia is one of the main sources of free knowledge on the Web. During the first few months of the pandemic, over 5,200 new Wikipedia pages on COVID-19 were created, accumulating over 400 million page views by mid-June 2020. 1 At the same time, an unprecedented amount of scientific articles on COVID-19 and the ongoing pandemic have been published online. Wikipedia’s content is based on reliable sources, such as scientific literature. Given its public function, it is crucial for Wikipedia to rely on representative and reliable scientific results, especially in a time of crisis. We assess the coverage of COVID-19-related research in Wikipedia via citations to a corpus of over 160,000 articles. We find that Wikipedia editors are integrating new research at a fast pace, and have cited close to 219 literature under consideration. While doing so, they are able to provide a representative coverage of COVID-19-related research. We show that all the main topics discussed in this literature are proportionally represented from Wikipedia, after accounting for article-level effects. We further use regression analyses to model citations from Wikipedia and show that Wikipedia editors on average rely on literature that is highly cited, widely shared on social media, and peer-reviewed

    An open educational resource to introduce data analysis in Python for the Humanities

    Full text link
    The article presents an open educational resource (OER) to introduce humanities students to data analysis with Python. The article beings with positioning the OER within wider pedagogical debates in the digital humanities. The OER is built from our research encounters and committed to computational thinking rather than technicalities. Furthermore, we argue that students best learn with the `whole game' methodology. Learners need to be exposed to meaningful activities as soon and as far as possible. The article presents two examples that implement our approach. The first introduces Python as a data analysis language to students of the humanities. It is different because it concentrates on the principles of the computational thinking behind data analysis rather than programming details. The second example takes the students into the world of machine learning and the whole game of social and cultural research with it. Students learn useful skills such as web scraping but will also run their own machine learning algorithms to follow concrete research questions

    Are We Breaking the Social Contract?

    Full text link
    The ambition of scholarship in the humanities is to systematically understand the human condition in all its aspects and times. To this end, humanists are more apt to interpret specific phenomena than generalize to previously unseen observations. When we consider scholarship as a collective effort, this has consequences. I argue that most of the humanities rely on a distinct social contract. This contract states that interpretive arguments are expected to be plausible and the grounds on which they are made, verifiable. This is the scholarly purpose (albeit not the rhetorical one) of most of what goes in our footnotes, especially references. Reference verification is mostly a virtual act, i.e., it all too rarely happens in practice, yet it is in principle always possible. Any individual scholar in any domain in the humanities can, by virtue of this contract, verify the evidence supporting any argument in a non-mediated way. This is essential to, at the very least, distinguish between solid and haphazard arguments

    A Map of Science in Wikipedia

    Full text link
    In recent decades, the rapid growth of Internet adoption is offering opportunities for convenient and inexpensive access to scientific information. Wikipedia, one of the largest encyclopedias worldwide, has become a reference in this respect, and has attracted widespread attention from scholars. However, a clear understanding of the scientific sources underpinning Wikipedia's contents remains elusive. In this work, we rely on an open dataset of citations from Wikipedia to map the relationship between Wikipedia articles and scientific journal articles. We find that most journal articles cited from Wikipedia belong to STEM fields, in particular biology and medicine (47.6% of citations; 46.1% of cited articles). Furthermore, Wikipedia's biographies play an important role in connecting STEM fields with the humanities, especially history. These results contribute to our understanding of Wikipedia's reliance on scientific sources, and its role as knowledge broker to the public

    The role of blogs and news sites in science communication during the COVID-19 pandemic

    Full text link
    We present a brief review of literature related to blogs and news sites; our focus is on publications related to COVID-19. We primarily focus on the role of blogs and news sites in disseminating research on COVID-19 to the wider public, that is knowledge transfer channels. The review is for researchers and practitioners in scholarly communication and social media studies of science who would like to find out more about the role of blogs and news sites during the COVID-19 pandemic. From our review, we see that blogs and news sites are widely used as scholarly communication channels and are closely related to each other. That is, the same research might be reported in blogs and news sites at the same time. They both play a particular role in higher education and research systems, due to the increasing blogging and science communication activity of researchers and higher education institutions (HEIs). We conclude that these two media types have been playing an important role for a long time in disseminating research, which even increased during the COVID-19 pandemic. This can be verified, for example, through knowledge graphs on COVID-19 publications that contain a significant amount of scientific publications mentioned in blogs and news sites

    Polarization and reliability of news sources in Wikipedia

    Full text link
    Purpose Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of external sources. News media outlets constitute nearly one-third of all citations within Wikipedia. However, embracing such a radically open approach also poses the challenge of the potential introduction of biased content or viewpoints into Wikipedia. The authors conduct an investigation into the integrity of knowledge within Wikipedia, focusing on the dimensions of source political polarization and trustworthiness. Specifically, the authors delve into the conceivable presence of political polarization within the news media citations on Wikipedia, identify the factors that may influence such polarization within the Wikipedia ecosystem and scrutinize the correlation between political polarization in news media sources and the factual reliability of Wikipedia's content. Design/methodology/approach The authors conduct a descriptive and regression analysis, relying on Wikipedia Citations, a large-scale open dataset of nearly 30 million citations from English Wikipedia. Additionally, this dataset has been augmented with information obtained from the Media Bias Monitor (MBM) and the Media Bias Fact Check (MBFC). Findings The authors find a moderate yet significant liberal bias in the choice of news media sources across Wikipedia. Furthermore, the authors show that this effect persists when accounting for the factual reliability of the news media. Originality/value The results contribute to Wikipedia’s knowledge integrity agenda in suggesting that a systematic effort would help to better map potential biases in Wikipedia and find means to strengthen its neutral point of view policy

    Conference Panel: The past, present and future of digital scholarship with newspaper collections

    No full text
    Historical newspapers are of interest to many humanities scholars, valued as sources of information and language closely tied to a particular time, social context and place. Following library and commercial microfilming and, more recently, digitisation projects, newspapers have been an accessible and valued source for researchers. The ability to use keyword searches through more data than ever before via digitised newspapers has transformed the work of researchers (as discussed by others including Putnam, 2016; Bingham, 2010). Digitised historic newspapers are also of interest to many researchers who seek large bodies of relatively easily computationally-transcribed text on which they can try new methods and tools. Intensive digitisation over the past two decades has seen smaller-scale or repository-focused projects flourish in the Anglophone and European world (Holley, 2009; King, 2005; Neudecker et al., 2014). However, just as earlier scholarship was potentially over-reliant on The Times of London and other metropolitan dailies, this has been replicated and reinforced by digitisation projects (for a Canadian example, see Milligan, 2013). In the last years, several large consortia projects proposing to apply data science and computational methods to historical newspapers at scale have emerged, including NewsEye, impresso, Oceanic Exchanges and Living with Machines. This panel has been convened by some consortia members to cast a critical view on past and ongoing digital scholarship with newspapers collections, and to inform its future. Digitisation can involve both complexities and simplifications. Knowledge about the imperfections of digitisation, cataloguing, corpus construction, text transcription and mining is rarely shared outside cultural institutions or projects. How can these imperfections and absences be made visible to users of digital repositories? Furthermore, how does the over-representation of some aspects of society through the successive winnowing and remediation of potential sources - from creation to collection, microfilming, preservation, licensing and digitisation - affect scholarship based on digitised newspapers. How can computational methods address some of these issues? The panel proposes the following format: short papers will be delivered by existing projects working on large collections of historical newspapers, presenting their vision and results to date. Each project is at different stages of development and will discuss their choice to work with newspapers, and reflect on what have they learnt to date on practical, methodological and user-focused aspects of this digital humanities work. The panel is additionally an opportunity to consider important questions of interoperability and legacy beyond the life of the project. Two further papers will follow, given by scholars with significant experience using these collections for research, in order to provide the panel with critical reflections. The floor will then open for debate and discussion. This panel is a unique opportunity to bring senior scholars with a long perspective on the uses of newspapers in scholarship together with projects at formative stages. More broadly, convening this panel is an opportunity for the DH2019 community to ask their own questions of newspaper-based projects, and for researchers to map methodological similarities between projects. Our hope is that this panel will foster a community of practice around the topic and encourage discussions of the methodological and pedagogical implications of digital scholarship with newspapers

    Quantifying Engagement with Citations on Wikipedia

    No full text
    Wikipedia is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia was not conceived as a source of original information, but as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of views on the topic. Although citations lie at the heart of Wikipedia, little is known about how users interact with them. To close this gap, we built client-side instrumentation for logging all interactions with links leading from English Wikipedia articles to cited references during one month, and conducted the first analysis of readers' interactions with citations. We find that overall engagement with citations is low: about one in 300 page views results in a reference click (0.29% overall; 0.56% on desktop; 0.13% on mobile). Matched observational studies of the factors associated with reference clicking reveal that clicks occur more frequently on shorter pages and on pages of lower quality, suggesting that references are consulted more commonly when Wikipedia itself does not contain the information sought by the user. Moreover, we observe that recent content, open access sources, and references about life events (births, deaths, marriages, etc.) are particularly popular. Taken together, our findings deepen our understanding of Wikipedia's role in a global information economy where reliability is ever less certain, and source attribution ever more vital
    corecore