1,721,217 research outputs found
PROV-N: The Provenance Notation
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; and, (5) properties to link entities that refer to the same thing; (6) collections forming a logical structure for its members. To provide examples of the PROV data model, the PROV notation (PROV-N) is introduced: aimed at human consumption, PROV-N allows serializations of PROV instances to be created in a compact manner. PROV-N facilitates the mapping of the PROV data model to concrete syntax, and is used as the basis for a formal semantics of PROV. The purpose of this document is to define the PROV-N notation
PROV-Dictionary: Modeling Provenance for Dictionary Data Structures
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. This document describes extensions to PROV to facilitate the modeling of provenance for dictionary data structures. [PROV-DM] specifies a Collection as an entity that provides a structure to some constituents, which are themselves entities. However, some applications may need a mechanism to specify more structure to a Collection, in order to accurately describe its provenance. Therefore, in this document, we introduce Dictionary, a specific type of Collection with a logical structure consisting of key-entity pairs
RO-Crate: packaging metadata love notes into FAIR Digital Objects
HMC Keynote: Carole Goble, The University of Manchester
Title: RO-Crate: packaging metadata love notes into FAIR Digital Objects
Abstract
The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers.
Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate!
RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity.
In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines.
Metadata is a love note to the future, RO-Crate is the delivery package.
[1] https://helmholtz-metadaten.de/en
[2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011
[3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053
[4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications802002
Handling Health Data: FAIR Research Objects for Trusted Research Environments
<p><span>Trusted Research Environments (TREs) are secure locations in which data are placed for researchers to analyse. TREs can be set up to host administrative data, hospital data or any other data that needs to remain securely isolated. It is hard for a researcher to perform an analysis across multiple TREs, requesting and gathering the data needed from each one. Federated analysis widens the scope of research and makes more effective use of data, but that data needs to be analysed across geographical or governance boundaries, for example in devolved healthcare in the UK and across national borders in Europe. </span></p>
<p><span>A federated infrastructure makes it much easier for analysis tools to access multiple TREs. Health Data Research UK (</span><u><span><a href="https://www.hdruk.ac.uk/">https://www.hdruk.ac.uk/</a></span></u><span>) through its </span><span>DARE UK programme</span><span> (</span><u><span><a href="https://dareuk.org.uk/">https://dareuk.org.uk/</a></span></u><span>) is developing a blueprint for TRE federation [1] and tools for federated data discovery.<span> </span>ELIXIR, the European Research Infrastructure for Life Science Data (</span><u><span><a href="https://elixir-europe.org/">https://elixir-europe.org/</a></span></u><span>), has developed Federated European Genome-Phenome Archive (FEGA) [2] and services for FAIR data management and computational workflows using GA4GH standards [3].</span></p>
<p><span>There are different ways of implementing the well-established TREs, and many popular analysis tools already in widespread use, so solutions need to be readily adoptable by existing systems. Moreover, the infrastructure needs to work within the “Five Safes” framework [4] that aims to protect data and enable data services to provide safe research access to data. The “Five Safes RO-Crate” [5] is a new way of packaging up the digital objects needed for research requests and results with the information needed for the tools and TRE providers to ensure that the Crates are reviewed and processed according to Five Safes principles. RO-Crate [6] is a community effort to establish a lightweight, native approach to packaging research data with their metadata (</span><u><span><a href="https://www.researchobject.org/ro-crate/">https://www.researchobject.org/ro-crate/</a></span></u><span>). Sponsored by ELIXIR and others, it has become a widely adopted framework for inter-service exchange, resource archiving, and reproducible reporting, used by digital research infrastructures and their services, including ELIXIR, the European Open Science Cloud, and the Australian </span><span>BioCommons</span><span>. It is an implementation of the FDO Forum’s FAIR Digital Objects (</span><u><span><a href="https://fairdo.org/">https://fairdo.org/</a></span></u><span>). </span></p>
<p><span>The TRE-FX project (</span><u><span><a href="https://trefx.uk/">https://trefx.uk/</a></span></u><span>) has piloted FAIR Five Safes RO-Crates and answering data queries within HDR UK TREs using pre-approved workflows using ELIXIR’s workflow execution technologies. Partnering with TREs from Scotland, Wales and England and analysis toolkits (</span><span>DataSHIELD</span><span>, </span><span>BitFount</span><span>), TRE-FX </span><span>streamlines the exchange of requests and results between analysis clients and TREs while ensuring</span><span> that the access is safe and the process transparent. </span><span>TELEPORT (https://dareuk.org.uk/driver-project-teleport/), a sister DARE UK project, follows a complementary federation strategy of ethereal “pop-up” TREs for requests that are only feasible over combined TREs.<span> </span>The combination of TRE-FX and TELEPORT is a powerful hybrid capable of addressing practical federated analysis patterns working within current data governance processes.</span></p>
<p><span> </span></p>
<p><span>From March 2024 HDR-UK and ELIXIR will combine forces in the Horizon Europe EOSC-ENTRUST project which aims </span><span>to create a European network of Trusted Research Environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. </span></p>
<p><span> </span></p>
<p><span>References</span></p>
<p><span>[1] DARE UK, “Federated Architecture Blueprint”, 2023, </span><span>https://dareuk.org.uk/our-work/federated-architecture-blueprint/</span></p>
<p><span>[2] European Genome-Phenome Archive, </span><span>FederatedEGA</span><span>, https://ega-archive.org/federated</span></p>
<p><span>[3] Thorogood A et al, “International federation of genomic medicine databases using GA4GH standards”, </span><span>Cell Genomics</span><span>, 1(2), 2021, https://doi.org/10.1016/j.xgen.2021.100032</span></p>
<p><span>[4] </span><span>UK Health Data Research Alliance, & NHSX. (2021). Building Trusted Research Environments - Principles and Best Practices; Towards TRE ecosystems (1.0). </span><span>Zenodo</span><span>. https://doi.org/10.5281/zenodo.5767586</span></p>
<p><span>[5] Soiland-Reyes, S., Wheater, S., Giles, T., Goble, C., & Quinlan, P. (2023). TRE-FX Technical Documentation - Five Safes RO-Crate (0.4). </span><span>Zenodo</span><span>. https://doi.org/10.5281/zenodo.10376350</span></p>
<p><span>[6] Soiland-Reyes S, Sefton P, Crosas M, Castro LJ, Coppens F, Fernández JM, Garijo D, Grüning B, Rosa ML, Leo S, Ó </span><span>Carragáin</span><span> E, Portier M, Trisovic A, RO-Crate Community, Groth P, Goble C (2022):<br></span><span>“Packaging research artefacts with RO-Crate”</span><span> </span><span>Data Science</span><span> </span><span>5(</span><span>2), </span><u><span><a href="https://doi.org/10.3233/DS-210053">https://doi.org/10.3233/DS-210053</a></span></u></p>
<p><span>ABOUT THE AUTHOR(S)</span></p>
<p><span>Carole Goble CBE </span><span>FREng</span><span> FBCS</span></p>
<p><span> </span></p>
<p><span>Carole Goble is a Professor of Computer Science at the University of Manchester, UK. She is a leader in Digital Research Infrastructures, translating technical innovations in distributed computing, semantic and metadata technologies, data and software sharing and computational workflows into FAIR and Open information solutions for scientists, in particular the Life Sciences and Biodiversity. She is currently: Joint Head of Node of ELIXIR-UK the UK node of ELIXIR, the European Research Infrastructure for Life Science Data; joint lead of the Federated Analytics programme for Health Data Research UK and a founder of the UK’s Software Sustainability Institute. Carole is an author of the seminal FAIR principles for scientific data and recipient of the Microsoft Jim Gray award for her contributions to eScience. </span></p>
What exactly happened to LSID?
What exactly happened to LSID? It was a technically sound approach it would seem and one whose failure we would do well to learn more from
CWL Viewer:The Common Workflow Language Viewer
The Common Workflow Language (CWL) project emerged from the BOSC 2014 Codefest as a grassroots, multi-vendor working group to tackle the portability of data analysis workflows. It’s specification for describing workflows and command line tools aims to make them portable and scalable across a variety of computing platforms. At its heart CWL is a set of structured text files (YAML) with various extensibility points to the format. However, the CWL syntax and multi-file collections are not conducive to workflow browsing, exchange and understanding: for thiswe need a visualization suite. CWL Viewer is a richly featured CWL visualization suite that graphically presents and lists the details of CWL workflows with their inputs, outputs and steps. It also packages the CWL files into a downloadable Research Object Bundle including attribution, versioning and dependency metadata in the manifest, allowing it to be easily shared. The tool operates over any workflow held in a GitHub repository. Other features include: path visualization from parents and children nodes; nested workflows support; workflow graph download in a range of image formats; a gallery of previously submitted workflows; and support for private git repositories and public GitHub including live updates over versioned workflows. The CWL Viewer is the de facto CWL visualization suite and has been enthusiastically received by the CWL communit
Tracking workflow execution with TavernaProv
Apache Taverna is a scientific workflow system for combining web services and local tools. Taverna records provenance of workflow runs, intermediate values and user interactions, both as an aid for debugging while designing the workflow, but also as a record for later reproducibility and comparison.Taverna also records provenance of the evolution of the workflow definition (including a chain of wasDerivedFrom relations), Attributions and annotations; for brevity we here focus on how Taverna's workflow run provenance extends PROV and is embedded with Research Objects
Research Object Bundle 1.0
This specification defines a file format for storage and distribution of Research Objects as a ZIP archive; called a Research Object Bundle (RO Bundle). RO Bundles allow capturing a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.See https://w3id.org/bundle/2014-11-05
ORE User Guide - Resource Map Implementation in JSON-LD 0.9
<p>Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. OAI-ORE introduces the notion of a <strong>Resource Map</strong>, an RDF Graph which describes the <strong>Aggregation</strong>, the aggregated Resources of which it is composed, and the relationships between them (and/or the relationships between these and other resources).</p>
<p>Since a Resource Map is an RDF Graph, it can be serialized using any RDF syntax. This document outlines the use of one such syntax for the serialization of Resource Maps: <strong>JSON-LD</strong>.</p>
<p>This document is intended for implementers who have an understanding of ORE concepts and are responsible for the development of applications which generate or process Resource Maps using JSON-LD.</p>This document is available at
http://www.openarchives.org/ore/0.9/jsonl
A lightweight approach to research object data packaging
A Research Object (RO) provides a machine-readable mechanism to communicate the diverse set of digital and real-world resources that contribute to an item of research. The aim of an RO is to evolve from traditional academic publication as a static PDF, to rather provide a complete and structured archive of the items (such as people, organisations, funding, equipment, software etc) that contributed to the research outcome, including their identifiers, provenance, relations and annotations.This is of particular importance as all domains of research and science are increasingly relying on computational analysis, yet we are facing a reproducibility crisis because key components are often not sufficiently tracked, archived or reported.Here we propose Research Object Crate (or RO-Crate for short), an emerging lightweight approach to packaging research data with their structured metadata, rephrasing the Research Object model as schema.org annotations to formalize a JSON-LD format that can be used independently of infrastructure, e.g. in GitHub or Zenodo archives. RO-Crate can be extended for domain-specific descriptions, aiming at a wide variety of applications and repositories to encourage FAIR sharing of reproducible datasets and analytical methods
- …
