Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dataflow analysis framework #1476

Open
wants to merge 203 commits into
base: main
Choose a base branch
from
Open

feat: Dataflow analysis framework #1476

wants to merge 203 commits into from

Conversation

acl-cqc
Copy link
Contributor

@acl-cqc acl-cqc commented Aug 28, 2024

Forwards analysis only ATM, parametrized over the abstract domain hence intended to support not only constant folding but (in the future) devirtualization, intergraph-edge-insertion, etc. See #1603 for an "example" use for constant-folding, where it does better than the existing code.

Much complexity is to do with "native" (irrespective of the underlying domain) treatment of Sum types necessary for proper understanding of control flow (e.g. conditionals, loops, CFGs).

Questions:

  • Should try_into_value, try_into_sum and try_read_wire_value be renamed to be more consistent (e.g. via some common terminology like "extract" or concrete"?)
  • Should we do Reference to HugrView is not a HugrView #1636 first, and then we'll be able to separate the DFContext from the HugrView (which would be neater)?
  • Should the Machine own the DFContext from creation, rather than passing into run (and, more annoyingly, each prepopulate method)?

Intended as a development of #1157, with significant changes:

  • Constant-folding and ValueHandle now stripped out, these will follow in a second PR

  • Everything is now in hugr-passes

  • Underlying domain of values abstracted over a trait AbstractValue (ValueHandle will implement this), which represents non-Sum values

  • datalog uses PartialValue wrapped around the AbstractValue to represent (Partial)Sums and make into a BoundedLattice

  • The old PV is gone (PartialValue directly implements BoundedLattice)

  • Interpretation of leaf (extension) ops is handled by the DFContext trait (although MakeTuple, and Untuple are handled by the framework - really prelude MakeTuple is just core Tag and Untuple is a single-Case Conditional with passthrough wires....); the framework handles routing of sums through these ops and all containers, also loading constants (with the DFContext handling non-Sum leaf Values).

  • Various refactoring of handling values (inc. in datalog) - variant_values+as_sum + more use of rows rather than indexing (this got rid of a bunch of unwraps and so on), significant refactoring of join/meet (and no _unsafe).

  • I've managed to refactor tests not to use ValueHandle etc. - they are only dealing with sum/loop/conditional routing after all. dataflow/test.rs uses about the simplest possible TestContext which provides zero information after any leaf-op - so we only get the framework-provided handling of Tag/MakeTuple/etc.

propolutate_out_wires largely superceded by passing root-node inputs into Machine::run, but still available for tests.

@acl-cqc acl-cqc force-pushed the acl/const_fold2 branch 2 times, most recently from 1594e6f to e1c49d7 Compare August 28, 2024 17:31
@acl-cqc acl-cqc requested a review from doug-q September 2, 2024 09:09
@acl-cqc acl-cqc changed the title DRAFT(v2?) Datalog-style constant-folding skeleton feat: Dataflow analysis framework and use for constant-folding Sep 2, 2024
Copy link

codecov bot commented Sep 2, 2024

Codecov Report

Attention: Patch coverage is 85.80247% with 161 lines in your changes missing coverage. Please review.

Project coverage is 86.23%. Comparing base (e63878f) to head (7040e83).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
hugr-passes/src/dataflow/datalog.rs 79.29% 60 Missing and 5 partials ⚠️
hugr-passes/src/dataflow/partial_value.rs 87.94% 40 Missing and 4 partials ⚠️
hugr-passes/src/dataflow.rs 46.80% 25 Missing ⚠️
hugr-passes/src/dataflow/value_row.rs 72.00% 14 Missing ⚠️
hugr-passes/src/dataflow/results.rs 77.96% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1476      +/-   ##
==========================================
+ Coverage   85.51%   86.23%   +0.72%     
==========================================
  Files         136      142       +6     
  Lines       25264    26535    +1271     
  Branches    22176    23447    +1271     
==========================================
+ Hits        21605    22883    +1278     
+ Misses       2455     2436      -19     
- Partials     1204     1216      +12     
Flag Coverage Δ
python 92.42% <ø> (ø)
rust 85.42% <85.80%> (+0.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

* DFContext reinstate fn hugr(), drop AsRef requirement (fixes StackOverflow)
* test_tail_loop_iterates_twice: use tail_loop_builder_exts, fix from #1332(?)
* Fix only-one-DataflowContext asserts using Arc::ptr_eq
Comment on lines 330 to 337
let init = if ins.iter().contains(&PartialValue::Bottom) {
// So far we think one or more inputs can't happen.
// So, don't pollute outputs with Top, and wait for better knowledge of inputs.
PartialValue::Bottom
} else {
// If we can't figure out anything about the outputs, assume nothing (they still happen!)
PartialValue::Top
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let init = if ins.iter().contains(&PartialValue::Bottom) {
// So far we think one or more inputs can't happen.
// So, don't pollute outputs with Top, and wait for better knowledge of inputs.
PartialValue::Bottom
} else {
// If we can't figure out anything about the outputs, assume nothing (they still happen!)
PartialValue::Top
};
let init = PartialValue::Bottom;

Note that we have no coverage here.
We should always initialise the outputs to Bottom. Initialising them to Top means interpret_leaf_op can't do anything, anything it joins will come out Top!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, coverage comes in #1603 .

interpret_leaf_op can do anything, it gets an &mut [PartialValue<V>] so can overwrite, it does not have to use join.

Note that returning Bottom erroneously is unsafe, i.e. can lead the analysis to conclude things that are not true. Returning Top is conservative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis could be better than contains(PartialValue::Bottom), i.e. we could apply the Bottom case more often, but at least this way is conservative.

(e.g. Sum(1, [Bottom, foo, bar]) also "can't happen", so must result from not-yet-computed data that'll turn up later in the analysis - that is, for a PartialSum, we can discount any variant at least one of whose value's (recursively) "contains bottom", and then the PartialSum classed as containing-bottom if it has no variants left)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to rename init to default_ or something like that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection you are right, this is fine as is. Perhaps interface/names/docs could be improved. I did not realise that I should overwrite the values rather than join into them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, but, as you suggest here, maybe we can simplify.

So there are some leaf ops that can figure out properties of their outputs even when some of the inputs are completely unknown, and I had thought that that meant we couldn't do as you suggest, but actually....maybe we can; we'll call interpret_leaf_op again later when we have non-bottom inputs. In which case we end up

  1. interpret_leaf_op never sees any Bottom inputs
  2. initial outputs passed to interpret_leaf_op are always PartialValue::Top
  3. if any input is Bottom, all outputs are set to Bottom (without even calling interpret_leaf_op)

Does that actually work?!?
This is partly about being clear about when Bottom is used - i.e. only as part of initialization, as a means of bootstrapping cyclic dependencies (maybe non-cyclic depending on how "clever" ascent tries to be)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe that works.

Everything is initialised to Bottom. Any inputs being bottom means this op is not "reachable"; i.e. it's guaranteed to never execute. Maybe it's in an unreachable block, maybe one of it's inputs comes from a panic (which should always set all outputs to bottom).

I'm not convinced passing a mutable slice with everything initialised to Top is the best way to do this, but that is an irrelevant detail here.

Copy link
Contributor Author

@acl-cqc acl-cqc Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, interpret_leaf_op no longer called if any input is Bottom. (I considered a default impl that checks row-contains-bottom but I think that if (a) Bottom can only occur during bootstrap/initialisation, and (b) the op doesn't execute unless it gets a valid input row, this is ok. The case of Conditionals being simplifiable because we know their predicate even if not all their inputs, never hits interpret_leaf_op, so I think this is OK.)

I also considered changing it to return Vec<PartialValue<V>> rather than take a mutable reference, but this does mean the callee becomes responsible for allocating a vector of the correct length; there are potentially many callees in the future, and only one caller, which this way can do the length-calculation. If you feel strongly that returning Vec<PartialValue<V>> would be better, then I'm willing to change.

}
(Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) {
Some(h3) => {
let ch = h3 != *h1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison can be expensive. I suggest try_join should be able to signal whether the self argument was changed. Perhaps

enum TryJoinResult<V> { Top, Unchanged(V), MaybeChanged(V), Changed(V) }

We should verify that try_join is not lying with debug_asserts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, first I changed AbstractValue::try_join to return (Self, bool) i.e. whether it's changed, thus avoiding the comparison. Rather than adding the debug_assert, though, I managed to avoid the clone (leaving nothing to compare against) by cunning use of std::mem::swap.

A large part of my brain here is saying "premature over-optimization" but...


// In `CFG` <Node>, basic block <Node> is reachable given our knowledge of predicates:
relation bb_reachable(Node, Node);
bb_reachable(cfg, entry) <-- cfg_node(cfg), if let Some(entry) = ctx.children(*cfg).next();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the entry node is reachable when cfg has bottom inputs. I suggest we should check that cfg has no-non bottom inputs before declaring that the entry block is reachable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fair. I think the same applies to conditional (if any non-predicate input is bottom, no cases are reachable; if the elements of any variant contain bottom, that case is unreachable), Call, DataflowBlock/ExitBlock (in the same way as conditional, i.e. gate every CF edge), and depending on semantics perhaps also DFG (i.e. if a nested DFG is a scheduling barrier)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we should have a reachable lattice (unreachable is bottom, reachable is top) defined on every Dataflow Node. A node is reachable if it's parent is (or it's the root) and all it's inputs are non-bottom. the in_wire_value <-- out_wire_value rule should require the node is reachable.
This is not required for correctness, but I expect it will be good for efficiency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by augmenting ValueRow::unpack_first (now ValueRow::unpack_first_no_bottom) and cases for CFG+DFG

func_call(call, func),
output_child(func, outp),
in_wire_value(outp, p, v);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a placeholder than a literal suggestion, but arguments of public functions (currently just "main") do need to be TOP.

Suggested change
};
// The arguments of public functions must be TOP
out_wire_value(func_i, OutgoingPort::from(p.index()), PartialValue::Top) <--
node(func_n),
if let Some(func) = ctx.get_optype(*func_n).as_func_defn(),
if func.name == "main",
input_child(func_n, func_i),
for (p,_) in ctx.out_value_types(*func_i);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...depending on the optimization target perhaps, but this is another way in which inputs can be provided if it's a Module, rather than e.g. DFG, -rooted Hugr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented that this can be done via prepopulate_out_wire for external functions. Whilst very basic, I think we should not do too much until we have settled how Hugr's are used as "libraries" (and perhaps also linked).

w: Wire,
) -> Result<V2, Option<ExtractValueError<V, VE, SE>>>
where
V2: TryFrom<V, Error = VE> + TryFrom<Sum<V2>, Error = SE>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and elsewhere, we should use TryInto for constraints, for the same reason that we use Into over From.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That took a surprising amount of fiddling but yes, done :)

*self = other;
true
}
(Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clone is potentially expensive and not required. We should, with enough chicanery, be able to take ownership of h1 and consume it here.

if h1 were a mutable ref we could do

  => {
   let mut temp = Self::top();
   std::mem::swap(h1, &mut temp);
   if let Some(h3) = temp.try_join(h2) {
     *h1 = h3;
   }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding the above as a comment, not solving it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, dang, I thought this was a comment on somewhere else, I have now done this. I can undo if you'd rather but it avoids that nasty unreachable! too....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't undo!. I was trying to communicate that I think this is an important point but that it shouldn't hold up this PR.

Copy link
Contributor Author

@acl-cqc acl-cqc Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, not undoing now - this enabled yet more refactoring here, it avoids all the nasty awkward bits we had before with just one swap, I think this is both clearest and shortest yet as well as more efficient :)

@hugrbot
Copy link
Collaborator

hugrbot commented Nov 19, 2024

This PR contains breaking changes to the public Rust API.
Please deprecate the old API instead (if possible), or mark the PR with a ! to indicate a breaking change.

cargo-semver-checks summary

--- failure trait_removed_associated_type: trait's associated type was removed ---

Description:
A public trait's associated type was removed or renamed.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/trait_removed_associated_type.ron

Failed in:
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69

--- failure trait_removed_associated_type: trait's associated type was removed ---

Description:
A public trait's associated type was removed or renamed.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/trait_removed_associated_type.ron

Failed in:
associated type CfgNodeMap::Iterator, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-passes/src/nest_cfgs.rs:72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants