Replies: 9 comments 24 replies
-
Ome design decision to make here: should steps and parameter values be specified at the top level of a (recipe or step) definition, or tucked away in The latter case is more wordy (including in code... i.e. you'd be saying things like my_recipe:
job: Recipe
for:
x: 1,2,3
steps:
step1:
job: Average
params:
foo: 0
step2:
job: Recipe
name: previously_defined_recipe
params:
bar: {!recipe.x} The former case will make for more compact definitions (and in code, my_recipe:
_job: Recipe
_for:
x: 1,2,3
step1:
_job: Average
foo: bar
step2:
_job: Recipe
_name: previously_defined_recipe
x: {!recipe.x} |
Beta Was this translation helpful? Give feedback.
-
Dealing with recipe outputsA recipe should have an outputs section which specifies which step's outputs should be inherited. In the (above example)[https://github.com//discussions/698#discussioncomment-360728] the output section could look like this .
.
.
outputs:
makems: all # all outputs from this cab
wsclean: # Another option would be select to some of the outputs
- "{*-MFS-image.fits}"
- "{*-MFS-model.fits}" |
Beta Was this translation helpful? Give feedback.
-
OK I checked in a little mock-up with my take on the above discussions. This defines a schema and loads a sample config with a nested sub-recipe: https://github.com/ratt-ru/Stimela/blob/configuratt/stimela/tests/proto_config_oms.py. Just run Writing this, I realized that OmegeConf's substitutions are a bit too simple for this purpose. But we can use standard Python info: 'top level recipe definition'
vars:
ms: demo.ms
dirs:
input: input
output: output
steps:
make_ms: # this step uses a cab
cab: simms
inputs:
msname: "{recipe.vars.ms}"
telescope: kat-7
dtime: 1
synthesis: 0.128
selfcal: # this step is a nested recipe
inputs:
ms: "{recipe.vars.ms}" # 'recipe' here refers to parent recipe
outputs:
image: final-image.fits # overrides output filename
recipe:
info: "this is a generic selfcal loop"
vars:
scale: 30asec
size: 256
_for:
selfcal_loop: 1,2,3 # repeat three times
steps:
calibrate:
cab: cubical
inputs:
ms: "{recipe.inputs.ms}"
_skip: "recipe.vars.selfcal_loop < 2" # skip on first iteration, go straight to image
imager:
cab: wsclean
inputs:
msname: "{recipe.inputs.ms}"
name: "image-{recipe.vars.selfcal_loop}"
scale: "{recipe.vars.scale}"
size: "{recipe.vars.size}"
evaluate:
cab: aimfast
inputs:
image: "{prev.outputs.residual_image}" # 'prev' refers to preceding step
_break_on: "step.outputs.dr_achieved" # break out of recipe based on some output value. 'step' refers to this step
# the below formally specifies the inputs and outputs of the selfcal recipe
parameters:
ms:
type: ms
io: both
default: null
# maps onto the output of the wsclean step
image:
maps: imager.outputs.image |
Beta Was this translation helpful? Give feedback.
-
Even cleaner would be to define |
Beta Was this translation helpful? Give feedback.
-
Should dtype not be an enum? |
Beta Was this translation helpful? Give feedback.
-
What's you typeThe types can be grouped into three: Basic
I/OHere I start with a capital letter because these are classes and with attributes such as
List
Dict
A type can be any of the above or a list of any combination thereof . |
Beta Was this translation helpful? Give feedback.
-
@SpheMakh check out the latest commit. Substitutions and cross-references now work more or less fully. Try
Example is evaluate:
cab: aimfast
params:
image: "{previous.restored}"
dirty: "{steps.image.dirty}"
I have gotten rid of the unsatisfying aliases:
msname: selfcal.ms
telescope: makems.tel
defaults:
telescope: kat-7 I've done it with a |
Beta Was this translation helpful? Give feedback.
-
@SpheMakh, I think this is becoming ready for action. I volunteer to implement a simple command-line runner, and start testing this with my experimental selfcal workflows. I'll leave the container runners to you. Getting on a flight now so maybe that's what I'll amuse myself with. |
Beta Was this translation helpful? Give feedback.
-
So this actually works now. For-loops, and a recipe library: lib:
recipes:
cubical_image:
name: "cubical_image"
info: 'does one step of cubical, followed by one step of imaging'
dirs:
log: logs
aliases:
ms: [calibrate.ms, image.ms]
steps:
calibrate:
cab: cubical
image:
cab: myclean
recipe:
name: "test loop"
for_loop:
var: ms
over: ms_list
aliases:
ms: ['cubical_image.ms']
inputs:
ms_list:
dtype: List[str]
required: true
defaults:
ms_list: [a,b,c]
steps:
cubical_image:
recipe:
_use: lib.recipes.cubical_image |
Beta Was this translation helpful? Give feedback.
-
@SpheMakh something you said at the Caracal meeting (recipe of recipes), plus the discussion on how to break up the selfcal worker there, got me thinking. You may have thought of some or all of this already, but maybe we should formalize a little. So here's a brain dump.
Desirata
A recipe should be a structured config object (so it can be saved/loaded as YaML, and generally passed around, e.g. Saving recipe information for a future run #686)
Maximum correctness checking up front. So making it structured config object in the OmegaConf sense works well. That way there's a schema based on Python
@dataclass
es behind it, and checking is done at construction.Recipes should be nestable
Inputs/outputs should be well-defined, for things like Merge usecwl branch #638 to work
You pass the recipe object to a runner (e.g. the local Docker runner, but in the future e.g. a CWL scheduler), and off it goes.
Basic types
I'm going to try to give unambiguous definitions for things, in most cases it should be clear how to express it in terms of Python's typing mechanism.
A
Parameter
has a type, an iomode (input/output/mixed), and a default. Parameters with no default are, by definition, mandatory.A
Parset
is a (possibly nested) set of named parameters. I.e.Mapping[str, Union[Parameter, Mapping[str, Union[...]]
. Why nested? See recipe parameters below. And anyway, cabs with large numbers of parameters could be made clearer by structuring the parameters into related groups.A
Job
defines a parset, and a payload. Examples of jobs are:Cab
,Recipe
,App
(#689), maybe a chunk of arbitrary callable Python. To exec a job, one must supply values for (at least) the mandatory parameters.Recipe
A
Recipe
is a sequence of named steps (OrderedDict[str, Step]
).A
Step
is an invocation of a job with a set of (possibly incomplete) parameter assignmentsRecipe parameters
So what constitutes a recipe's parset? I think we need a default policy that makes something sensible from the contents of the recipe, plus a way to override this policy if so desired.
Suggested default policy: a recipe's parset consists of (A) all the input/mixed parameters of the first step, plus (B) all the output/mixed parameters of the final step, plus (C) any mandatory parameters not defined for the intermediate steps, nested under their step names.
So as purely notional example, let's say you have a Transform job (mixed parameter
ms
, input parameterfield
, optional parameterselection=''
) an Average job (ms
, optional input parameterstimeavg=1
andchanavg=1
), and a Calibrate job (mixed parameterms
, output parametercaltable
, mandatory inputsolint
, bunch of optional input parameters), and you string the recipe together like so:...then your recipe's "public" parset looks like this by default:
If the recipe designer does not like this default parset, they can define an explicit
parameters
section in their recipe, which explicitly defines the parameters of the recipe (in terms of what steps' parameters to map them onto). Stimela will then need to check when the recipe is constructed, to make sure that every mandatory parameter of every step is mapped.Cross-referencing parameters
There was already an example of this above (
{..average.ms}
), which is a standard OmegaConf construct.We can think of extending this with something like
{!previous.ms}
(to refer to a parameter of the preceding step), or{!recipe.ms}
(to refer to a top-level parameter of the recipe) to make recipes more composable.Furthermore, I suggest the following rule: if a mandatory output/mixed parameter of a step is missing, but a subsequent step makes a cross-reference to this parameter, then Stimela should generate a temporary filename for it. (Thus, we can automatically plumb intermediate steps together without needing to name the intermediate file products explicitly).
Enabling and conditionals
Every step should recognize an
enable
field (true if missing), so that it can be easily disabled withenable: false
.From this, it is but a hop and a jump to supporting conditionals:
enable
can cross-reference a parameter of another step, thus making this step conditional on the result of a preceding step. (We might need to eventually think about richer syntax that a {}-substitution, but this is already pretty powerful).Looping!
A germ of an idea, but from this, it is very easy to make a recipe that is defined as a loop or an iterator. Add something like this to the recipe conf:
...which then tells Stimela to repeat the recipe three times, with x=1, y=a, then with x=2, y=b, then with x=3, y=c. The values can be cross-referenced from steps as something like
{!recipe.x}
. So in the body of the recipe, there shouldn't really be any distinction between referencing a parameter, or referencing a loop variable.You can also think of adding loop conditionals, i.e.
_while: condition
and_until: condition
, where "condition" is as forenable
above.Beta Was this translation helpful? Give feedback.
All reactions