-
Notifications
You must be signed in to change notification settings - Fork 0
Curation Manual Wiki
A home for VFB curation guidelines and SOPs.
/records
relations_spec.yaml # Specification of relations that are legal to use in curation
new_datasets/ # Curation records for adding new datasets
ds_spec.yaml # Specification of fields used in dataset curation
working/ # Records here are checked for syntax only
to_submit/ # Records here are fully checked and loaded to a test DB.
A Jenkins job is used to load passing records from here into the KB.
new_images/ # Curation records for adding new images
common_fields_spec.yaml # Specification of fields that may be used in all new_image curation
anat_spec.yaml # Specification of fields for new anatomy image curation
split_spec.yaml # Specification of fields for new split image curation
ep_spec.yaml # Specification of fields for new expression pattern image curation
working/ # Records here are checked for syntax only
to_submit/ # Records here are fully checked and loaded to a test DB.
A Jenkins job is used to load passing records from here into the KB.
new_metadata/ # Curation records for adding new metadata to existing images
common_fields_spec.yaml
newmeta_spec.yaml
working/ Records here are checked for syntax only
to_submit/ Records here are fully checked and loaded to a test DB.
A Jenkins job is used to load passing records from here into the KB.
archive/ # Archive submitted records here
We run a number of pipelines from external datasources. We track progress of curation through these pipelines via reports generated nightly on VFB_reporting_results. See accompanying README.md for details of file contents.
Curation and images are staged for public release on our staging servers, following ad hoc^ runs of the VFB pipeline. Following pipeline runs, new content should be searchable, queryable and browsable on v2a.virtualflybrain.org.
^ This may move to a regular cycle in the near future.
Curation progress is tracked via the DataSet staging board.
Curation files are named: {Type}_{DataSetName}_YYMMDD e.g. Anat_Berck2016_191015.
- The type prefix is needed as this is used by the parser to determine how to process.
- DataSetName is required in order to attach curation to the correct dataset.
- YYMMDD is needed as there may be > 1 curation record per dataset.
Types:
- Expression pattern (ep): Used to load new (single driver) expression pattern images.
- Split (split): Used to load new split expression pattern images.;
- Anatomy (anat): Used to load new anatomical images (e.g. a neuron o
- New Metadata (newmeta): Used to extend annotation on existing images.
Curation cards/tickets on the board follow a similar naming convention:
Card/ticket Name | Example | Description | Project board SOP |
---|---|---|---|
DS: {Source} {DataSetName} | DS: L1EM Berck2016 | DataSet Epic | In DataSets column until subtasks complete |
Images: {Source} {DataSetName} | Images: L1EM Berck2016 | Image loading task for DataSet | Move through sprint columns |
Curation: {Source} {curation filename} | Curation: L1EM Berck2016_191015 | Curation task for DataSet. | Move through sprint columns |
Anatomy: {Source} {DataSetName} | Anatomy: L1EM Berck2016 | Ontology task for DataSet | Card with link to FBbt ticket. Move through sprint columns |
Features: {Source} {DataSetName} | Features: FlyLight Ito2015 | Feature curation task for DataSet | Move through sprint columns |
Sources here are large-scale projects/data providers: FlyLight; FlyCircuit; L1EM; FAFB; FlyEM...
- short_form = surname of first author + year e.g. Berck2016. Where this would => multiple datasets with the same name, extend with a single lower-case letter a, b, c etc as need.
- DataSet label = a longer name that is descriptive of DataSet contents. Guidelines and examples TBA
Warning - this is an overview and may be out of date. For the latest spec please see YAML spec files (linked below).
Curation files are plain .tsv (unquoted) or .yaml files. All fields may be specified in a .tsv files, but some may be optionally specified in an accompanying .yaml file. This is useful for fields whose content applies to all rows in a .tsv file, e.g. for images this might include dataset, imaging_type and template (see below for an example). Any accompanying .yaml file must have the same name as the tsv file, apart from the extension (.tsv/.yaml)'
Within curation files, ontology terms and FlyBase features are all specified by name (see below for details of how to cope with special characters). Where fields take multiple entries, these are separated by a '|'. VFB individuals (the structures depicted in images) are specified by internal VFB ID or external DB ID. DataSets are referred by their short_form (e.g. Berck2016)
Some fields are common to all images in a dataset, so specifying them individually for all rows in a data file would be inefficient. We specify these in simple YAML files.
e.g.
dataset: Berck2016
template: L1EM
imaging_type: TEM
curators: [CP, DOS, RC] # Need convention for this - all are converted to orcids in DB
STATUS: This works - please try it!
NewMetadata (YAML spec; Example TSV; Example YAML):
Add new metadata by specifying relationships: subject, object, relation & optional comment/pub with evidence for relationship. subject may be referred to by VFB id, or using some external ID. Relation and object are referred to by name. Relation must be one specified in relations_spec.yaml.
- subject_external_db: VFB DataBase ID for external DB
- subject_external_id: External ID for subject in database referred to
- subject_id: VFB ID of subject
- subject_name: Optionally provide subject name for cross-check
- relation: The relation must be either is_a or one of a standard set agreed for curation - see relations_spec.yaml
- object: The name of an ontology term (typically FBbt) or a FlyBase feature - see relations_spec.yaml.
- object_external_db: VFB DataBase ID for external DB
- object_external_id: External ID for subject in database referred to
- ind_object_id: VFB ID of object
Options for specifying object:
- specify an individual object with either and xref object_external_db + object_external_id or an id (ind_object_id) + name (object) used for checking OR
- specify a type object with an FBbt name field only
Order of precedence: Xref over-rides VFBid. Both over-ride object field as type name.
STATUS: Development still in progress
- Expression pattern (YAML spec; Example - TBA): Specify a driver using a FlyBase feature name. Submission of this curation record will create the expression pattern node if it does not already exist.
- Split (YAML spec; Example - TBA): Specify AD and DBD using FlyBase feature names. Submission of this curation record will create the appropriate split expression pattern node if it does not already exist.
- Anatomy (YAML spec; Example): Specify Classification (IS_A) and reasons for classification; Optionally specify a driver.
TBA.
Some notes:
- Preserving original names of entities is essential.
- Sometimes there are no clear original names at the individual level - only for classes. In these cases we need to make the names unique, consistent and informative. The simplest way to do this is to name for type + dataset + some number if needed for uniqueness.