Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NoThrowDAG #6

Merged
merged 91 commits into from
Aug 9, 2023
Merged

Implement NoThrowDAG #6

merged 91 commits into from
Aug 9, 2023

Conversation

hannahilea
Copy link
Contributor

@hannahilea hannahilea commented Aug 1, 2023

Implement concrete AbstractTransformationSpecification NoThrowTransformChain that consists of an ordered set of NoThrowTransformations interleaved with transforms that create a step's input from all outputs of the upstream steps.


Okay, so here's the deal with this one: there are a gazillion different design decisions that could be made here, and I had to fight not to fall into a self-sniped state (and did, in fact, change my mind about several of these decisions mid-implementation). So for the sake of "steady progress", I'm opening this for review now, and in future PRs we can iterate on things like (but not limited to):

  • better/nicer error throwing during chain construction
  • whether this should actually be parameterized in some way and/or allow a non-NoThrow option
  • the type and construction of the transformation called between each step

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, haven't done a complete review but I've thought a bit more about my high level/spiritual concern with the "input assembler" bit: it basically throws the specification bit out the window, if you're allowed (nay, encouraged!) to go in and rename/modify fields in your inputs willy-nilly. especially when that is happening inside a (possibly anonymous) function that's basically a black box as far as readers of the code/the compiler is concerned. if a step takes two distinct kinds of input then I think we should just bite the bullet and do the refactoring to all that to be declared, or use some kind of composition to make it so that multiple outputs from shared input can be merged in a way that's declarative (something like a "multi step" that does the merging of different steps before checking the output spec).

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, as we discussed on slack I'm okay with this more-or-less as-is. I had one suggestion for a possibility to use a more restrictive input assembler that just maps fields from previous output to next step input, since that seems to be what the current validation assumes is happening. but you can take or leave that!

Comment on lines +225 to +229
function _validate_input_assembler(dag::NoThrowDAG,
input_assembler::TransformSpecification)
transform(input_assembler, dag._step_output_fields) # Will throw if any field doesn't exist
return nothing
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let me check that I'm understanding what's going on here: the idea is that in order to verify that an input assembler will not error at run time, when we construct the dag, we use a stand-in dictionary with the same structure as the workspace dict tha twill be created during actual execution (maps the names of the existing steps to Dicts that can be accessed with getindex(fields, ::Symbol)).

if that's right, it suggests to me that the actual intent of the input assembler is actually much more restrictive than any generic anonymous function, and if someone tries to create one that doesn't follow that structure you're going to get some kind of obscure error at DAG construction time.

but it also means that we might be able to do something like this:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct Field
    in::Symbol
    out::Symbol
end

# only really needed for showing these...
Base.show(io::IO, f::Field) = f.in == f.out ? show(io, f.in) : show(io, f.in => f.out)

Field(x::Symbol) = Field(x, x)
Field(x::Pair{Symbol,Symbol}) = Field(x...)

struct InputAssembler <: AbstractTransformSpecification
    deps::Dict{String,Vector{Field}}
    function InputAssembler(deps::Dict{String,Vector{Field}})
        # make sure that there are no name collisions
        out_fields = (f.out for f in Iterators.flatten(values(deps)))
        # TODO: actually find the non-unique fields to help users
        allunique(out_fields) || throw(ArgumentError("all output fields must be unique"))
        new(deps)
    end
end

function InputAssembler(; inputs...)
    deps = Dict(string(dep) => Field.(fields) for (dep, fields) in inputs)
    return InputAssembler(deps)
end

_get(x, f::Field) = f.out => x[f.in]

function transform(input::InputAssembler, upstream)
    output = (;)
    for (dep, fields) in input.deps
        # get the upstream namedtuple-like for this dependency
        updep = upstream[dep]
        # create the output namedtuple with the new field names
        this_output = (; [_get(updep, field) for field in fields]...)
        output = merge(output, this_output)
    end
    return output
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed to implement in #8.

@hannahilea hannahilea changed the title Implement NoThrowTransformChain Implement NoThrowDAG Aug 9, 2023
hannahilea and others added 2 commits August 9, 2023 13:51
Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com>
@hannahilea hannahilea merged commit 58d7a41 into main Aug 9, 2023
@hannahilea hannahilea deleted the hr/chain-brain branch August 9, 2023 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants