You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When serializing executions of workflows that take directory parameters, CWLProv does not create corresponding directories in the RO bundle: rather, files are always placed in directories whose name consists of the first two characters of the file's sha1 checksum.
When converting from CWLProv we recreate the original directories, giving them a name obtained by concatenating the sorted checksums of all contained files and computing the checksum of the concatenation. This means that directories with the same contents end up being mapped to the same directory in the output RO-Crate. This is especially convenient to avoid data duplication between workflow parameters and tool parameters: for instance, when a directory is an input of the workflow and also of the first step.
However, there are cases where we might not want to do that. For instance, suppose that a workflow takes an array of two directories as input:
cwlVersion: v1.2class: Workflowrequirements:
ScatterFeatureRequirement: {}inputs:
dir_array: Directory[]outputs: []steps:
date_step:
label: Prints date of input dirsscatter: dirin:
dir: dir_arrayout: []run: dirdate.cwl
Suppose the workflow is launched with the following parameters:
dir_array:
- class: Directorylocation: foo
- class: Directorylocation: bar
Where foo and bar have the same contents, e.g., they both contain a text file whose content is the string "dummy". What we currently get in the RO-Crate is:
Note that the duplicate id in the value of #pv-main/dir_array is a bug: the list should contain only one copy, since the duplicate makes no sense in the RO-Crate JSON-LD. Also, the Dataset has an alternateName of "foo", while "bar" does not appear in the metadata. Thus, in this case, the representation does not reflect the fact that the workflow took a list of two distinct directories as input.
The text was updated successfully, but these errors were encountered:
When serializing executions of workflows that take directory parameters, CWLProv does not create corresponding directories in the RO bundle: rather, files are always placed in directories whose name consists of the first two characters of the file's sha1 checksum.
When converting from CWLProv we recreate the original directories, giving them a name obtained by concatenating the sorted checksums of all contained files and computing the checksum of the concatenation. This means that directories with the same contents end up being mapped to the same directory in the output RO-Crate. This is especially convenient to avoid data duplication between workflow parameters and tool parameters: for instance, when a directory is an input of the workflow and also of the first step.
However, there are cases where we might not want to do that. For instance, suppose that a workflow takes an array of two directories as input:
Where
dirdate.cwl
is:Suppose the workflow is launched with the following parameters:
Where
foo
andbar
have the same contents, e.g., they both contain a text file whose content is the string "dummy". What we currently get in the RO-Crate is:Note that the duplicate id in the
value
of#pv-main/dir_array
is a bug: the list should contain only one copy, since the duplicate makes no sense in the RO-Crate JSON-LD. Also, theDataset
has analternateName
of "foo", while "bar" does not appear in the metadata. Thus, in this case, the representation does not reflect the fact that the workflow took a list of two distinct directories as input.The text was updated successfully, but these errors were encountered: