Skip to content

Metadata (how to label and find things in DataBank)

kfletch edited this page Mar 14, 2012 · 5 revisions

Metadata

By default, DataBank will automatically index by DC terms (i.e. minimal vocabulary: Dublin Core for RDF). To be indexed in this way, the data must be in the "manifest.rdf" file which accompanies each data package.

Users can also add metadata by zipping a separate file into the data package (e.g. to provide domain-specific ontological information, or to provide a "readme" file to accompany the dataset). The manifest.rdf file can point to these files in a machine-readable way (e.g. "rdfs:seeAlso"). If users submit their data packages via DataStage, they can add other human-readable information using the DataStage "Description" field, but it will not be in RDF format, so will not be indexed by DataBank or by web crawlers.

Searching within DataBank

Within DataBank, the search function allows users to search by metadata (the data included in the manifest.rdf file by the depositor). The DataBank search function can read metadata attached to the files within data packages, although it cannot parse the content of the files themselves. A top-level search can identify an individual file within a data package if the metadata was sufficient to allow identification.

There is also a browse function: Users can browse for silos (although this may be time-consuming in a populous repository). Once a user has identified a relevant silo, he/she can also browse the contents (filenames and other metadata) of the silo. When a user identifies a target data package, if the files are not under embargo, DataBank can unzip the data package, to allow access to the files themselves.

Visibility to the wider world

By default, all DataBank submissions are assigned a Digital Object Identifier (DOI) -- multiple versions of the same data package are assigned separate DOIs.

By default all data held in a non-dark instance of DataBank will be visible to Google and any other web crawlers. Users can make files more visible by including richer metadata in the "manifest.rdf" file (the metadata "label" on the data package). Alternately, administrators can add a robots file saying they do NOT want the instance to be crawled by web indexing services.

Clone this wiki locally