Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 2.71 KB

21c.container.md

File metadata and controls

12 lines (6 loc) · 2.71 KB

RO-Crate is a self-described container {#selfdescribed}

\label{sec:selfdescribed}

An RO-Crate is defined as a self-described Root Data Entity that describes and contains data entities, which are further described by referencing contextual entities. A data entity is either a file (i.e. a byte sequence stored on disk somewhere) or a directory (i.e. set of named files and other directories). A file does not need to be stored inside the RO-Crate root, it can be referenced via a PID/IRI. A contextual entity exists outside the information system (e.g. a Person, a workflow language) and is stored solely by its metadata. The representation of a data entity as a byte sequence makes it possible to store a variety of research artefacts including not only data but also, for instance, software and text.

The Root Data Entity is a directory, the RO-Crate Root, identified by the presence of the RO-Crate Metadata File ro-crate-metadata.json (top of Figure \ref{fig:conceptual}). This file fdescribes the RO-Crate using Linked Data, its content and related metadata using Linked Data in JSON-LD format [@sporny_2014]. This is a W3C standard RDF serialisation that has become popular; it is easy to read by humans while also offering some advantages for data exchange on the Internet. JSON-LD, a subset of the widely supported and well-known JSON format, has tooling available for many programming languages.

The minimal requirements for the root data entity metadata are name, description and datePublished, as well as a contextual entity identifying its license — additional metadata are commonly added to entities depending on the purpose of the particular RO-Crate.

RO-Crates can be stored, transferred or published in multiple ways, e.g. BagIt [@doi:10.17487/rfc8493], Oxford Common File Layout [@ocfl_2020] (OCFL), downloadable ZIP archives in Zenodo or through dedicated online repositories, as well as published directly on the Web, e.g. using GitHub Pages. Combined with Linked Data identifiers, this caters for a diverse set of storage and access requirements across different scientific domains, from metagenomics workflows producing hundreds of gigabytes of genome data to cultural heritage records with access restrictions for personally identifiable data. Specific RO-Crate profiles (section \ref{sec:profiles}) may constrain serialization and publication expectations, and require additional contextual types and properties.