-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NoThrowDAG
#6
Conversation
Co-authored-by: Glenn Moynihan <glennmoy@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, haven't done a complete review but I've thought a bit more about my high level/spiritual concern with the "input assembler" bit: it basically throws the specification bit out the window, if you're allowed (nay, encouraged!) to go in and rename/modify fields in your inputs willy-nilly. especially when that is happening inside a (possibly anonymous) function that's basically a black box as far as readers of the code/the compiler is concerned. if a step takes two distinct kinds of input then I think we should just bite the bullet and do the refactoring to all that to be declared, or use some kind of composition to make it so that multiple outputs from shared input can be merged in a way that's declarative (something like a "multi step" that does the merging of different steps before checking the output spec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, as we discussed on slack I'm okay with this more-or-less as-is. I had one suggestion for a possibility to use a more restrictive input assembler that just maps fields from previous output to next step input, since that seems to be what the current validation assumes is happening. but you can take or leave that!
function _validate_input_assembler(dag::NoThrowDAG, | ||
input_assembler::TransformSpecification) | ||
transform(input_assembler, dag._step_output_fields) # Will throw if any field doesn't exist | ||
return nothing | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, let me check that I'm understanding what's going on here: the idea is that in order to verify that an input assembler will not error at run time, when we construct the dag, we use a stand-in dictionary with the same structure as the workspace dict tha twill be created during actual execution (maps the names of the existing steps to Dicts that can be accessed with getindex(fields, ::Symbol)
).
if that's right, it suggests to me that the actual intent of the input assembler is actually much more restrictive than any generic anonymous function, and if someone tries to create one that doesn't follow that structure you're going to get some kind of obscure error at DAG construction time.
but it also means that we might be able to do something like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct Field
in::Symbol
out::Symbol
end
# only really needed for showing these...
Base.show(io::IO, f::Field) = f.in == f.out ? show(io, f.in) : show(io, f.in => f.out)
Field(x::Symbol) = Field(x, x)
Field(x::Pair{Symbol,Symbol}) = Field(x...)
struct InputAssembler <: AbstractTransformSpecification
deps::Dict{String,Vector{Field}}
function InputAssembler(deps::Dict{String,Vector{Field}})
# make sure that there are no name collisions
out_fields = (f.out for f in Iterators.flatten(values(deps)))
# TODO: actually find the non-unique fields to help users
allunique(out_fields) || throw(ArgumentError("all output fields must be unique"))
new(deps)
end
end
function InputAssembler(; inputs...)
deps = Dict(string(dep) => Field.(fields) for (dep, fields) in inputs)
return InputAssembler(deps)
end
_get(x, f::Field) = f.out => x[f.in]
function transform(input::InputAssembler, upstream)
output = (;)
for (dep, fields) in input.deps
# get the upstream namedtuple-like for this dependency
updep = upstream[dep]
# create the output namedtuple with the new field names
this_output = (; [_get(updep, field) for field in fields]...)
output = merge(output, this_output)
end
return output
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed to implement in #8.
Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>
…rmSpecifications.jl into hr/chain-brain
Implement concrete
AbstractTransformationSpecification
NoThrowTransformChain
that consists of an ordered set ofNoThrowTransformation
s interleaved with transforms that create a step's input from all outputs of the upstream steps.Okay, so here's the deal with this one: there are a gazillion different design decisions that could be made here, and I had to fight not to fall into a self-sniped state (and did, in fact, change my mind about several of these decisions mid-implementation). So for the sake of "steady progress", I'm opening this for review now, and in future PRs we can iterate on things like (but not limited to):