492 research outputs found

    PROV-N: The Provenance Notation

    No full text
    Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. PROV-DM distinguishes core structures, forming the essence of provenance information, from extended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended; (2) derivations of entities from entities; (3) agents bearing responsibility for entities that were generated and activities that happened; (4) a notion of bundle, a mechanism to support provenance of provenance; and, (5) properties to link entities that refer to the same thing; (6) collections forming a logical structure for its members. To provide examples of the PROV data model, the PROV notation (PROV-N) is introduced: aimed at human consumption, PROV-N allows serializations of PROV instances to be created in a compact manner. PROV-N facilitates the mapping of the PROV data model to concrete syntax, and is used as the basis for a formal semantics of PROV. The purpose of this document is to define the PROV-N notation

    Sharing research artefacts as FAIR Digital Objects using RO-Crate

    No full text
    Presented to Brookhaven National Laboratory 2023-01-23 In this talk Stian Soiland-Reyes will introduce RO-Crate as a set of recommendations for sharing research artefacts along with their contextual metadata as FAIR Digital Objects (FDO). This will show how RO-Crate use extensible and well-established Web-standards (JSON-LD, Schema.org) and how developers and researchers can take advantage of RO-Crate as a lightweight method to build Linked Data. This talk will also show how RO-Crate is being used by a range of research projects, with many specializing profiles being formed, such as for plant sciences and COVID-19 datasets. Video recording: https://youtu.be/0T4FBbpgtQoVideo recording: https://youtu.be/0T4FBbpgtQ

    The Archive and Package (arcp) URI scheme

    Full text link
    The arcp URI scheme is introduced for location-independent identifiers to consume or reference hypermedia and linked data resources bundled inside a file archive, as well as to resolve archived resources within programmatic frameworks for Research Objects. Research Object: http://s11.no/2018/arcp.html#ro Cite as: Stian Soiland-Reyes, Marcos Cáceres (2018): The Archive and Package (arcp) URI Scheme. 2018 IEEE 14th International Conference on e-Science (e-Science). https://doi.org/10.1109/eScience.2018.00018Author-prepared preprint. Web version: http://s11.no/2018/arcp.html Publisher version: https://doi.org/10.1109/eScience.2018.0001

    Research Object Bundle 1.0

    Full text link
    This specification defines a file format for storage and distribution of Research Objects as a ZIP archive; called a Research Object Bundle (RO Bundle). RO Bundles allow capturing a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.See https://w3id.org/bundle/2014-11-05

    Provenance in distributed systems: a process algebraic study of provenance management and its role in establishing trust in data quality

    No full text
    We aim to develop a formal framework to reason about provenance in distributed systems. We take as our starting point an extension of the asynchronous pi-calculus where processes are explicitly assigned principal identities. We enrich this basic setting with provenance annotated data, dynamic provenance tracking and dynamically checked trust policies. We give several examples to illustrate the use of the calculus in modelling systems where principals base their trust in the quality of data on the provenance information associated with it.We consider the role of provenance in the calculus by relating the provenance tracking semantics to a plain one in which no provenance tracking or checking takes place. We further substantiate this by studying bisimulation-based behavioural equivalences for the plain and annotated versions of the calculus and contrasting the discriminating power of the equivalences obtained in each case. We also give a more denotational take on the semantics of the provenance calculus and look at notions of well-formedness and soundness for the provenance tracking semantics.We consider two different extensions of the basic calculus. The first aims to alleviate the cost of run time provenance tracking and checking by defining a static type system which guarantees that in well-typed systems principals always receive data with provenance that matches their requirements. The second extension looks at the ramifications of provenance tracking on privacy and security policies and consists of extending the calculus with a notion we call filters. This gives principals the ability to assign different views of the provenance of a given value to different principals, thus allowing for the selective disclosure of provenance information. We study behavioural equivalences for this extension of the calculus, paying particular attention to the set of principals composing the observer and its role in discriminating between systems

    Taverna Tutorials 2014-09-01 (Bonn)

    No full text
    <p>Taverna tutorials and training material.</p> <p>Presented at Bonn University MSc course on Taverna.</p> <p>http://www.myexperiment.org/groups/1267</p> <p>Editors: Stian Soiland-Reyes & Christian Brenninkmeijer.</p>Slideshare - http://dev.mygrid.org.uk/wiki/display/tav250/Tutorials Sources - http://github.com/taverna/taverna-tutorials

    PROV-Dictionary: Modeling Provenance for Dictionary Data Structures

    No full text
    Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. This document describes extensions to PROV to facilitate the modeling of provenance for dictionary data structures. [PROV-DM] specifies a Collection as an entity that provides a structure to some constituents, which are themselves entities. However, some applications may need a mechanism to specify more structure to a Collection, in order to accurately describe its provenance. Therefore, in this document, we introduce Dictionary, a specific type of Collection with a logical structure consisting of key-entity pairs

    RO-Crate: packaging metadata love notes into FAIR Digital Objects

    No full text
    HMC Keynote: Carole Goble, The University of Manchester Title: RO-Crate: packaging metadata love notes into FAIR Digital Objects Abstract The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers. Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate! RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity. In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines. Metadata is a love note to the future, RO-Crate is the delivery package. [1] https://helmholtz-metadaten.de/en [2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011 [3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053 [4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications802002

    Handling Health Data: FAIR Research Objects for Trusted Research Environments

    No full text
    <p><span>Trusted Research Environments (TREs) are secure locations in which data are placed for researchers to analyse. TREs can be set up to host administrative data, hospital data or any other data that needs to remain securely isolated. It is hard for a researcher to perform an analysis across multiple TREs, requesting and gathering the data needed from each one. Federated analysis widens the scope of research and makes more effective use of data, but that data needs to be analysed across geographical or governance boundaries, for example in devolved healthcare in the UK and across national borders in Europe. </span></p> <p><span>A federated infrastructure makes it much easier for analysis tools to access multiple TREs. Health Data Research UK (</span><u><span><a href="https://www.hdruk.ac.uk/">https://www.hdruk.ac.uk/</a></span></u><span>) through its </span><span>DARE UK programme</span><span> (</span><u><span><a href="https://dareuk.org.uk/">https://dareuk.org.uk/</a></span></u><span>) is developing a blueprint for TRE federation [1] and tools for federated data discovery.<span>  </span>ELIXIR, the European Research Infrastructure for Life Science Data (</span><u><span><a href="https://elixir-europe.org/">https://elixir-europe.org/</a></span></u><span>), has developed Federated European Genome-Phenome Archive (FEGA) [2] and services for FAIR data management and computational workflows using GA4GH standards [3].</span></p> <p><span>There are different ways of implementing the well-established TREs, and many popular analysis tools already in widespread use, so solutions need to be readily adoptable by existing systems. Moreover, the infrastructure needs to work within the “Five Safes” framework [4] that aims to protect data and enable data services to provide safe research access to data. The “Five Safes RO-Crate” [5] is a new way of packaging up the digital objects needed for research requests and results with the information needed for the tools and TRE providers to ensure that the Crates are reviewed and processed according to Five Safes principles. RO-Crate [6] is a community effort to establish a lightweight, native approach to packaging research data with their metadata (</span><u><span><a href="https://www.researchobject.org/ro-crate/">https://www.researchobject.org/ro-crate/</a></span></u><span>). Sponsored by ELIXIR and others, it has become a widely adopted framework for inter-service exchange, resource archiving, and reproducible reporting, used by digital research infrastructures and their services, including ELIXIR, the European Open Science Cloud, and the Australian </span><span>BioCommons</span><span>. It is an implementation of the FDO Forum’s FAIR Digital Objects (</span><u><span><a href="https://fairdo.org/">https://fairdo.org/</a></span></u><span>). </span></p> <p><span>The TRE-FX project (</span><u><span><a href="https://trefx.uk/">https://trefx.uk/</a></span></u><span>) has piloted FAIR Five Safes RO-Crates and answering data queries within HDR UK TREs using pre-approved workflows using ELIXIR’s workflow execution technologies. Partnering with TREs from Scotland, Wales and England and analysis toolkits (</span><span>DataSHIELD</span><span>, </span><span>BitFount</span><span>), TRE-FX </span><span>streamlines the exchange of requests and results between analysis clients and TREs while ensuring</span><span> that the access is safe and the process transparent. </span><span>TELEPORT (https://dareuk.org.uk/driver-project-teleport/), a sister DARE UK project, follows a complementary federation strategy of ethereal “pop-up” TREs for requests that are only feasible over combined TREs.<span>  </span>The combination of TRE-FX and TELEPORT is a powerful hybrid capable of addressing practical federated analysis patterns working within current data governance processes.</span></p> <p><span> </span></p> <p><span>From March 2024 HDR-UK and ELIXIR will combine forces in the Horizon Europe EOSC-ENTRUST project which aims </span><span>to create a European network of Trusted Research Environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. </span></p> <p><span> </span></p> <p><span>References</span></p> <p><span>[1] DARE UK, “Federated Architecture Blueprint”, 2023, </span><span>https://dareuk.org.uk/our-work/federated-architecture-blueprint/</span></p> <p><span>[2] European Genome-Phenome Archive, </span><span>FederatedEGA</span><span>, https://ega-archive.org/federated</span></p> <p><span>[3] Thorogood A et al, “International federation of genomic medicine databases using GA4GH standards”, </span><span>Cell Genomics</span><span>, 1(2), 2021, https://doi.org/10.1016/j.xgen.2021.100032</span></p> <p><span>[4] </span><span>UK Health Data Research Alliance, & NHSX. (2021). Building Trusted Research Environments - Principles and Best Practices; Towards TRE ecosystems (1.0). </span><span>Zenodo</span><span>. https://doi.org/10.5281/zenodo.5767586</span></p> <p><span>[5] Soiland-Reyes, S., Wheater, S., Giles, T., Goble, C., & Quinlan, P. (2023). TRE-FX Technical Documentation - Five Safes RO-Crate (0.4). </span><span>Zenodo</span><span>. https://doi.org/10.5281/zenodo.10376350</span></p> <p><span>[6] Soiland-Reyes S, Sefton P, Crosas M, Castro LJ, Coppens F, Fernández JM, Garijo D, Grüning B, Rosa ML, Leo S, Ó </span><span>Carragáin</span><span> E, Portier M, Trisovic A, RO-Crate Community, Groth P, Goble C (2022):<br></span><span>“Packaging research artefacts with RO-Crate”</span><span> </span><span>Data Science</span><span> </span><span>5(</span><span>2), </span><u><span><a href="https://doi.org/10.3233/DS-210053">https://doi.org/10.3233/DS-210053</a></span></u></p> <p><span>ABOUT THE AUTHOR(S)</span></p> <p><span>Carole Goble CBE </span><span>FREng</span><span> FBCS</span></p> <p><span> </span></p> <p><span>Carole Goble is a Professor of Computer Science at the University of Manchester, UK. She is a leader in Digital Research Infrastructures, translating technical innovations in distributed computing, semantic and metadata technologies, data and software sharing and computational workflows into FAIR and Open information solutions for scientists, in particular the Life Sciences and Biodiversity. She is currently: Joint Head of Node of ELIXIR-UK the UK node of ELIXIR, the European Research Infrastructure for Life Science Data; joint lead of the Federated Analytics programme for Health Data Research UK and a founder of the UK’s Software Sustainability Institute. Carole is an author of the seminal FAIR principles for scientific data and recipient of the Microsoft Jim Gray award for her contributions to eScience. </span></p&gt

    RO-Crate Metadata Specification 1.1

    No full text
    This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata.The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment.While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples
    corecore