-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dataflow analysis framework #1476
base: main
Are you sure you want to change the base?
Conversation
1594e6f
to
e1c49d7
Compare
718f058
to
6698b1b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1476 +/- ##
==========================================
+ Coverage 85.51% 86.23% +0.72%
==========================================
Files 136 142 +6
Lines 25264 26535 +1271
Branches 22176 23447 +1271
==========================================
+ Hits 21605 22883 +1278
+ Misses 2455 2436 -19
- Partials 1204 1216 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
e54c742
to
6cac41a
Compare
* DFContext reinstate fn hugr(), drop AsRef requirement (fixes StackOverflow) * test_tail_loop_iterates_twice: use tail_loop_builder_exts, fix from #1332(?) * Fix only-one-DataflowContext asserts using Arc::ptr_eq
…text interprets load_constant
…o value_from_const
hugr-passes/src/dataflow/datalog.rs
Outdated
let init = if ins.iter().contains(&PartialValue::Bottom) { | ||
// So far we think one or more inputs can't happen. | ||
// So, don't pollute outputs with Top, and wait for better knowledge of inputs. | ||
PartialValue::Bottom | ||
} else { | ||
// If we can't figure out anything about the outputs, assume nothing (they still happen!) | ||
PartialValue::Top | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let init = if ins.iter().contains(&PartialValue::Bottom) { | |
// So far we think one or more inputs can't happen. | |
// So, don't pollute outputs with Top, and wait for better knowledge of inputs. | |
PartialValue::Bottom | |
} else { | |
// If we can't figure out anything about the outputs, assume nothing (they still happen!) | |
PartialValue::Top | |
}; | |
let init = PartialValue::Bottom; |
Note that we have no coverage here.
We should always initialise the outputs to Bottom. Initialising them to Top means interpret_leaf_op
can't do anything, anything it joins will come out Top!.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, coverage comes in #1603 .
interpret_leaf_op
can do anything, it gets an &mut [PartialValue<V>]
so can overwrite, it does not have to use join.
Note that returning Bottom
erroneously is unsafe, i.e. can lead the analysis to conclude things that are not true. Returning Top
is conservative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The analysis could be better than contains(PartialValue::Bottom)
, i.e. we could apply the Bottom
case more often, but at least this way is conservative.
(e.g. Sum(1, [Bottom, foo, bar])
also "can't happen", so must result from not-yet-computed data that'll turn up later in the analysis - that is, for a PartialSum, we can discount any variant at least one of whose value's (recursively) "contains bottom", and then the PartialSum classed as containing-bottom if it has no variants left)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help to rename init
to default_
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection you are right, this is fine as is. Perhaps interface/names/docs could be improved. I did not realise that I should overwrite the values rather than join into them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, but, as you suggest here, maybe we can simplify.
So there are some leaf ops that can figure out properties of their outputs even when some of the inputs are completely unknown, and I had thought that that meant we couldn't do as you suggest, but actually....maybe we can; we'll call interpret_leaf_op
again later when we have non-bottom inputs. In which case we end up
interpret_leaf_op
never sees anyBottom
inputs- initial outputs passed to
interpret_leaf_op
are alwaysPartialValue::Top
- if any input is
Bottom
, all outputs are set toBottom
(without even callinginterpret_leaf_op
)
Does that actually work?!?
This is partly about being clear about when Bottom is used - i.e. only as part of initialization, as a means of bootstrapping cyclic dependencies (maybe non-cyclic depending on how "clever" ascent tries to be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe that works.
Everything is initialised to Bottom. Any inputs being bottom means this op is not "reachable"; i.e. it's guaranteed to never execute. Maybe it's in an unreachable block, maybe one of it's inputs comes from a panic (which should always set all outputs to bottom).
I'm not convinced passing a mutable slice with everything initialised to Top is the best way to do this, but that is an irrelevant detail here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, interpret_leaf_op
no longer called if any input is Bottom. (I considered a default impl that checks row-contains-bottom but I think that if (a) Bottom can only occur during bootstrap/initialisation, and (b) the op doesn't execute unless it gets a valid input row, this is ok. The case of Conditionals being simplifiable because we know their predicate even if not all their inputs, never hits interpret_leaf_op, so I think this is OK.)
I also considered changing it to return Vec<PartialValue<V>>
rather than take a mutable reference, but this does mean the callee becomes responsible for allocating a vector of the correct length; there are potentially many callees in the future, and only one caller, which this way can do the length-calculation. If you feel strongly that returning Vec<PartialValue<V>>
would be better, then I'm willing to change.
} | ||
(Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) { | ||
Some(h3) => { | ||
let ch = h3 != *h1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comparison can be expensive. I suggest try_join
should be able to signal whether the self argument was changed. Perhaps
enum TryJoinResult<V> { Top, Unchanged(V), MaybeChanged(V), Changed(V) }
We should verify that try_join
is not lying with debug_assert
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, first I changed AbstractValue::try_join
to return (Self, bool)
i.e. whether it's changed, thus avoiding the comparison. Rather than adding the debug_assert, though, I managed to avoid the clone
(leaving nothing to compare against) by cunning use of std::mem::swap
.
A large part of my brain here is saying "premature over-optimization" but...
|
||
// In `CFG` <Node>, basic block <Node> is reachable given our knowledge of predicates: | ||
relation bb_reachable(Node, Node); | ||
bb_reachable(cfg, entry) <-- cfg_node(cfg), if let Some(entry) = ctx.children(*cfg).next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the entry node is reachable when cfg
has bottom inputs. I suggest we should check that cfg has no-non bottom inputs before declaring that the entry block is reachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair. I think the same applies to conditional (if any non-predicate input is bottom, no cases are reachable; if the elements of any variant contain bottom, that case is unreachable), Call, DataflowBlock/ExitBlock (in the same way as conditional, i.e. gate every CF edge), and depending on semantics perhaps also DFG (i.e. if a nested DFG is a scheduling barrier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we should have a reachable
lattice (unreachable is bottom, reachable is top) defined on every Dataflow Node. A node is reachable if it's parent is (or it's the root) and all it's inputs are non-bottom. the in_wire_value <-- out_wire_value
rule should require the node is reachable.
This is not required for correctness, but I expect it will be good for efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by augmenting ValueRow::unpack_first
(now ValueRow::unpack_first_no_bottom
) and cases for CFG+DFG
func_call(call, func), | ||
output_child(func, outp), | ||
in_wire_value(outp, p, v); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a placeholder than a literal suggestion, but arguments of public functions (currently just "main") do need to be TOP.
}; | |
// The arguments of public functions must be TOP | |
out_wire_value(func_i, OutgoingPort::from(p.index()), PartialValue::Top) <-- | |
node(func_n), | |
if let Some(func) = ctx.get_optype(*func_n).as_func_defn(), | |
if func.name == "main", | |
input_child(func_n, func_i), | |
for (p,_) in ctx.out_value_types(*func_i); | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes...depending on the optimization target perhaps, but this is another way in which inputs can be provided if it's a Module, rather than e.g. DFG, -rooted Hugr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented that this can be done via prepopulate_out_wire
for external functions. Whilst very basic, I think we should not do too much until we have settled how Hugr's are used as "libraries" (and perhaps also linked).
w: Wire, | ||
) -> Result<V2, Option<ExtractValueError<V, VE, SE>>> | ||
where | ||
V2: TryFrom<V, Error = VE> + TryFrom<Sum<V2>, Error = SE>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and elsewhere, we should use TryInto
for constraints, for the same reason that we use Into
over From
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That took a surprising amount of fiddling but yes, done :)
*self = other; | ||
true | ||
} | ||
(Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clone is potentially expensive and not required. We should, with enough chicanery, be able to take ownership of h1 and consume it here.
if h1 were a mutable ref we could do
=> {
let mut temp = Self::top();
std::mem::swap(h1, &mut temp);
if let Some(h3) = temp.try_join(h2) {
*h1 = h3;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding the above as a comment, not solving it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, dang, I thought this was a comment on somewhere else, I have now done this. I can undo if you'd rather but it avoids that nasty unreachable!
too....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't undo!. I was trying to communicate that I think this is an important point but that it shouldn't hold up this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, not undoing now - this enabled yet more refactoring here, it avoids all the nasty awkward bits we had before with just one swap
, I think this is both clearest and shortest yet as well as more efficient :)
This PR contains breaking changes to the public Rust API. cargo-semver-checks summary
|
Forwards analysis only ATM, parametrized over the abstract domain hence intended to support not only constant folding but (in the future) devirtualization, intergraph-edge-insertion, etc. See #1603 for an "example" use for constant-folding, where it does better than the existing code.
Much complexity is to do with "native" (irrespective of the underlying domain) treatment of Sum types necessary for proper understanding of control flow (e.g. conditionals, loops, CFGs).
Questions:
try_into_value
,try_into_sum
andtry_read_wire_value
be renamed to be more consistent (e.g. via some common terminology like "extract" or concrete"?)Machine
own theDFContext
from creation, rather than passing intorun
(and, more annoyingly, eachprepopulate
method)?BREAK_TAG
/CONTINUE_TAG
from feat: Add TailLoop::BREAK_TAG and CONTINUE_TAG #1626Module
-rooted HugrsIntended as a development of #1157, with significant changes:
Constant-folding and ValueHandle now stripped out, these will follow in a second PR
Everything is now in hugr-passes
Underlying domain of values abstracted over a trait AbstractValue (ValueHandle will implement this), which represents non-Sum values
datalog uses PartialValue wrapped around the AbstractValue to represent (Partial)Sums and make into a BoundedLattice
The old
PV
is gone (PartialValue
directly implements BoundedLattice)Interpretation of leaf (extension) ops is handled by the DFContext trait (although MakeTuple, and Untuple are handled by the framework - really prelude
MakeTuple
is just coreTag
andUntuple
is a single-Case Conditional with passthrough wires....); the framework handles routing of sums through these ops and all containers, also loading constants (with the DFContext handling non-Sum leafValue
s).Various refactoring of handling values (inc. in datalog) -
variant_values
+as_sum
+ more use of rows rather than indexing (this got rid of a bunch of unwraps and so on), significant refactoring of join/meet (and no_unsafe
).I've managed to refactor tests not to use ValueHandle etc. - they are only dealing with sum/loop/conditional routing after all.
dataflow/test.rs
uses about the simplest possibleTestContext
which provides zero information after any leaf-op - so we only get the framework-provided handling of Tag/MakeTuple/etc.propolutate_out_wires
largely superceded by passing root-node inputs intoMachine::run
, but still available for tests.