1,720,980 research outputs found

    Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata

    No full text
    Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users.These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality.Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy.This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms

    Who models the world?: Collaborative ontology creation and user roles in Wikidata

    No full text
    Wikidata is a collaborative knowledge graph which is central to many academic and industry IT projects. Its users are responsible for maintaining the schema that organises this knowledge into classes, properties, and attributes, which together form the Wikidata ‘ontology’. In this paper, we study the relationship between different Wikidata user roles and the quality of the Wikidata ontology. To do so we first propose a framework to evaluate the ontology as it evolves. We then cluster editing activities to identify user roles in monthly time frames. Finally, we explore how each role impacts the ontology. Our analysis shows that the Wikidata ontology has uneven breadth and depth. We identified two user roles: contributors and leaders. The second category is positively associated to ontology depth, with no significant effect on other features. Further work should investigate other dimensions to define user profiles and their influence on the knowledge graph

    What makes a good collaborative knowledge graph: group composition and quality in Wikidata

    No full text
    Wikidata is a community-driven knowledge graph which has drawn much attention from researchers and practitioners since its inception in 2012. The large user pool behind this project has been able to produce information spanning over several domains, which is openly released and can be reused to feed any information-based application. Collaborative production processes in Wikidata have not yet been explored. Understanding them is key to prevent potentially harmful community dynamics and ensure the sustainability of the project in the long run. We performed a regression analysis to investigate how the contribution of different types of users, i.e. bots and human editors, registered or anonymous, influences outcome quality in Wikidata. Moreover, we looked at the effects of tenure and interest diversity among registered users. Our findings show that a balanced contribution of bots and human editors positively influence outcome quality, whereas higher numbers of anonymous edits may hinder performance. Tenure and interest diversity within groups also lead to higher quality. These results may be helpful to identify and address groups that are likely to underperform in Wikidata. Further work should analyse in detail the respective contributions of bots and registered users

    Provenance information in a collaborative knowledge graph: an evaluation of Wikidata external references

    No full text
    Wikidata is a collaboratively-edited knowledge graph; it expresses knowledge in the form of subject-property-value triples, which can be enhanced with references to add provenance information. Understanding the quality of Wikidata is key to its widespread adoption as a knowledge resource. We analyse one aspect of Wikidata quality, provenance, in terms of relevance and authoritativeness of its external references. We follow a two-staged approach. First, we perform a crowdsourced evaluation of references. Second, we use the judgements collected in the first stage to train a machine learning model to predict reference quality on a large-scale. The features chosen for the models were related to reference editing and the semantics of the triples they referred to. 61% of the references evaluated were relevant and authoritative. Bad references were often links that changed and either stopped working or pointed to other pages. The machine learning models outperformed the baseline and were able to accurately predict non-relevant and non-authoritative references. Further work should focus on implementing our approach in Wikidata to help editors find bad references

    The Influence of Plasmodesmata Number and Opening State on Molecular Transports in Plants

    Full text link
    Molecular Communication (MC) studies the transport of information encoded in signaling molecules. To date, its application field is mainly restrained to health-related uses. However, MC in plants has been gaining increasing interest. The primary transport route in plant cell-to-cell communication are Plasmodesmata (PDs), pore-like structures dotting the plant cell wall. PDs opening state is influenced by several environmental damaging factors (i.e., plant viruses), and plant cells try to restore homeostasis through defense mechanisms. In this letter, we seek to depict the complexity of plant-based communication, and we propose a simple model that proves the influence of the PDs number and opening state in the transport of information in plants

    Wikidatians are born: paths to full participation in a collaborative structured knowledge base

    Full text link
    We investigated how participation evolves in Wikidata as its editors become established members of the community. Originally conceived to support Wikipedia, Wikidata is a collaborative structured knowledge base, created and maintained by a large number of volunteers, whose data can be freely reused in other contexts. Just like in any other online social environment, understanding its contributors’ pathways to full participation helps Wikidata improve user experience and retention.We analysed how participation changes in time under the frameworks of legitimate peripheral participation and activity theory. We found out that as they engage more with the project, “Wikidatians” acquire a higher sense of responsibility for their work, interact more with the community, take on more advanced tasks, and use a wider range of tools. Previous activity in Wikipedia has varied effects. As Wikidata is a young community, future work should focus on volunteers with little or no experience in similar projects and specify means to improve critical aspects such as engagement and data quality

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore