1,721,370 research outputs found
A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing
This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process<br/
Who models the world?: Collaborative ontology creation and user roles in Wikidata
Wikidata is a collaborative knowledge graph which is central to many academic and industry IT projects. Its users are responsible for maintaining the schema that organises this knowledge into classes, properties, and attributes, which together form the Wikidata ‘ontology’. In this paper, we study the relationship between different Wikidata user roles and the quality of the Wikidata ontology. To do so we first propose a framework to evaluate the ontology as it evolves. We then cluster editing activities to identify user roles in monthly time frames. Finally, we explore how each role impacts the ontology. Our analysis shows that the Wikidata ontology has uneven breadth and depth. We identified two user roles: contributors and leaders. The second category is positively associated to ontology depth, with no significant effect on other features. Further work should investigate other dimensions to define user profiles and their influence on the knowledge graph
Open Data and entrepreneurship
That there is potential for entrepreneurial development of innovative, economically beneficial products and services from Open Data makes logical sense, as data is increasingly available for the creation of new insights and activities. Although innovation with open data takes place across all sizes and ages of organisations, entrepreneurs and start-up businesses are important players. This paper considers the barriers to entrepreneurship with open data, as well as the sustainability, supportive policies and impact
Volunteer engagement in short-term virtual citizen science projects
Virtual citizen science (VCS) projects have proven to be a highly effective method to analyse large quantities of data for scientific research purposes. Yet if these projects are to achieve their goals, they must attract and maintain the interest of sufficient numbers of active, dedicated volunteers. Although CSCW and HCI research has typically focussed on designing platforms to support long-term engagement, in recent years a new project format has been trialled -- using short-term crowdsourcing activities lasting as little as 48 hours. In this paper, we explore two short-term projects to understand how they influence participant engagement in the task and discussion elements of VCS. We calculate descriptive statistics to characterise project participants. Additionally, using calculation of correlation coefficients and hypothesis testing, we identify factors influencing volunteer task engagement and the effect this has on project outcomes. Our findings contribute to the understanding of volunteer engagement in VCS
Trusts, co-ops and crowd workers: could we include crowd data workers as stakeholders in data trust design?
Data trusts have been proposed as mechanism through which data can be more readily exploited for a variety of aims, including economic development and social-benefit goals such as medical research or policy-making. Data Trusts, and similar data governance mechanisms such as Data Co-Ops, aim to facilitate the use and reuse of datasets across organisational boundaries and, in the process, to protect the interests of stakeholders such as data subjects. However, current discourse on Data Trusts does not acknowledge another common stakeholder in the data value chain – the crowd workers who are employed to collect, validate, curate and transform data. In this paper, we report on a preliminary qualitative investigation into how crowd data workers themselves feel datasets should be used and governed. We find that while overall remuneration is important to those workers, they also value public-benefit data use, but have reservations about delayed remuneration and the trustworthiness of both administrative processes and the crowd itself. We discuss the implications of our findings for how data trusts could be designed, and how data trusts could be used to give crowd workers a more enduring stake in the product of their work
The human face of the web of data: a cross-sectional study of labels
Labels in the web of data are the key element for humans to access the data. We introduce a framework to measure the coverage of information with labels. The framework is based on a set of metrics including completeness, unambiguity, multilinguality, labeled object usage, and monolingual islands. We apply this framework on seven diverse datasets, from the web of data, a collaborative knowledge base, open governmental and GLAM data. We gain an insight into the current state of labels and multilinguality on the web of data. Comparing a set of differently sourced datasets can help data publishers to understand what they can improve and what other ways of collecting and data can be adopted
Collaborative ontology engineering: a survey
Building ontologies in a collaborative and increasingly community-driven fashion has become a central paradigm of modern ontology engineering. This understanding of ontologies and ontology engineering processes is the result of intensive theoretical and empirical research within the Semantic Web community, supported by technology developments such as Web 2.0. Over 6 years after the publication of the first methodology for collaborative ontology engineering, it is generally acknowledged that, in order to be useful, but also economically feasible, ontologies should be developed and maintained in a community-driven manner, with the help of fully-fledged environments providing dedicated support for collaboration and user participation. Wikis, and similar communication and collaboration platforms enabling ontology stakeholders to exchange ideas and discuss modeling decisions are probably the most important technological components of such environments. In addition, process-driven methodologies assist the ontology engineering team throughout the ontology life cycle, and provide empirically grounded best practices and guidelines for optimizing ontology development results in real-world projects. The goal of this article is to analyze the state of the art in the field of collaborative ontology engineering. We will survey several of the most outstanding methodologies, methods and techniques that have emerged in the last years, and present the most popular development environments, which can be utilized to carry out, or facilitate specific activities within the methodologies. A discussion of the open issues identified concludes the survey and provides a roadmap for future research and development in this lively and promising fiel
A comparison of dataset search behaviour of internal versus search engine referred sessions
storytelling to labelling for supervised machine learning. Previous qualitative research suggests that people use two types of search affordances to find the data they need: they either go to a data portal that probably contains the data and search there; or they start on a regular web search engine, which sometimes returns results that are datasets. For the first type of search, prior works have analysed logs from different data portals to understand basic tenets of search behaviour such as query length or topics. In this paper, we advance the state of the art in dataset search behaviour with a comprehensive transaction log analysis study (n = 236441 sessions) of an international open data portal, in which we compare sessions straight on a data portal (internal searches) against sessions that land on a dataset or SERP (search engine result page) through a referral from a web search engine (external). Using dataset downloads as a proxy for successful searches, we find a statistically significant, though weak relationship between the use of keyword search and session type and between the use of search facets and session type (moderate). We also discover and discuss behavioural patterns and user profiles across session types
Data Sharing Toolkit: Lessons learned, resources and recommendations for sharing data
Data plays a major role in the European economy, and building a European data economy is one of the strategic goals of the European Commission. Through the increase of data science techniques, not least Machine Learning (ML) and Artificial Intelligence (AI), the value and role of data as an asset becomes ever more crucial. This has made it more important for data to be accessible. However, much of the data that many solutions require are held within private organisations - and are only available if they are shared. Data sharing in this sense means allowing third parties specifically permissioned access to datasets to generate value.This toolkit has been developed to help organisations that want to generate value by sharing data or facilitating data sharing. We explain the concept, challenges, and processes to enable successful data sharing, and provide resources and recommendations. It is derived from experience collected in the Data Pitch programme and related national and international initiatives, such as the Smart Cities Innovation Framework Implementation (SciFi), the European Data Incubator (EDI), as well as several recent pilots for data trusts in the UK
- …
