feat: Dataflow analysis framework #1476

acl-cqc · 2024-08-28T11:22:37Z

Forwards analysis only ATM, parametrized over the abstract domain hence intended to support not only constant folding but (in the future) devirtualization, intergraph-edge-insertion, etc. See #1603 for an "example" use for constant-folding, where it does better than the existing code.

Much complexity is to do with "native" (irrespective of the underlying domain) treatment of Sum types necessary for proper understanding of control flow (e.g. conditionals, loops, CFGs).

Questions:

Should try_into_value, try_into_sum and try_read_wire_value be renamed to be more consistent (e.g. via some common terminology like "extract" or concrete"?)
Should we do Reference to HugrView is not a HugrView #1636 first, and then we'll be able to separate the DFContext from the HugrView (which would be neater)?
Should the Machine own the DFContext from creation, rather than passing into run (and, more annoyingly, each prepopulate method)?

TODO: update with BREAK_TAG / CONTINUE_TAG from feat: Add TailLoop::BREAK_TAG and CONTINUE_TAG #1626
TODO: test handling of Module-rooted Hugrs
TODO: handle (+test) panic op by returning Bottom on all outputs

Intended as a development of #1157, with significant changes:

Constant-folding and ValueHandle now stripped out, these will follow in a second PR
Everything is now in hugr-passes
Underlying domain of values abstracted over a trait AbstractValue (ValueHandle will implement this), which represents non-Sum values
datalog uses PartialValue wrapped around the AbstractValue to represent (Partial)Sums and make into a BoundedLattice
The old PV is gone (PartialValue directly implements BoundedLattice)
Interpretation of leaf (extension) ops is handled by the DFContext trait (although MakeTuple, and Untuple are handled by the framework - really prelude MakeTuple is just core Tag and Untuple is a single-Case Conditional with passthrough wires....); the framework handles routing of sums through these ops and all containers, also loading constants (with the DFContext handling non-Sum leaf Values).
Various refactoring of handling values (inc. in datalog) - variant_values+as_sum + more use of rows rather than indexing (this got rid of a bunch of unwraps and so on), significant refactoring of join/meet (and no _unsafe).
I've managed to refactor tests not to use ValueHandle etc. - they are only dealing with sum/loop/conditional routing after all. dataflow/test.rs uses about the simplest possible TestContext which provides zero information after any leaf-op - so we only get the framework-provided handling of Tag/MakeTuple/etc.

propolutate_out_wires largely superceded by passing root-node inputs into Machine::run, but still available for tests.

codecov · 2024-09-02T14:44:48Z

Codecov Report

Attention: Patch coverage is 85.80247% with 161 lines in your changes missing coverage. Please review.

Project coverage is 86.23%. Comparing base (e63878f) to head (7040e83).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
hugr-passes/src/dataflow/datalog.rs	79.29%	60 Missing and 5 partials ⚠️
hugr-passes/src/dataflow/partial_value.rs	87.94%	40 Missing and 4 partials ⚠️
hugr-passes/src/dataflow.rs	46.80%	25 Missing ⚠️
hugr-passes/src/dataflow/value_row.rs	72.00%	14 Missing ⚠️
hugr-passes/src/dataflow/results.rs	77.96%	10 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1476      +/-   ##
==========================================
+ Coverage   85.51%   86.23%   +0.72%     
==========================================
  Files         136      142       +6     
  Lines       25264    26535    +1271     
  Branches    22176    23447    +1271     
==========================================
+ Hits        21605    22883    +1278     
+ Misses       2455     2436      -19     
- Partials     1204     1216      +12

Flag	Coverage Δ
python	`92.42% <ø> (ø)`
rust	`85.42% <85.80%> (+0.86%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

…t Hash)

* DFContext reinstate fn hugr(), drop AsRef requirement (fixes StackOverflow) * test_tail_loop_iterates_twice: use tail_loop_builder_exts, fix from #1332(?) * Fix only-one-DataflowContext asserts using Arc::ptr_eq

…s what was meant

…g len

…text interprets load_constant

…o value_from_const

doug-q · 2024-11-15T15:04:24Z

hugr-passes/src/dataflow/datalog.rs

+            let init = if ins.iter().contains(&PartialValue::Bottom) {
+                // So far we think one or more inputs can't happen.
+                // So, don't pollute outputs with Top, and wait for better knowledge of inputs.
+                PartialValue::Bottom
+            } else {
+                // If we can't figure out anything about the outputs, assume nothing (they still happen!)
+                PartialValue::Top
+            };


Suggested change

let init = if ins.iter().contains(&PartialValue::Bottom) {

// So far we think one or more inputs can't happen.

// So, don't pollute outputs with Top, and wait for better knowledge of inputs.

PartialValue::Bottom

} else {

// If we can't figure out anything about the outputs, assume nothing (they still happen!)

PartialValue::Top

};

let init = PartialValue::Bottom;

Note that we have no coverage here.
We should always initialise the outputs to Bottom. Initialising them to Top means interpret_leaf_op can't do anything, anything it joins will come out Top!.

Yeah, coverage comes in #1603 .

interpret_leaf_op can do anything, it gets an &mut [PartialValue<V>] so can overwrite, it does not have to use join.

Note that returning Bottom erroneously is unsafe, i.e. can lead the analysis to conclude things that are not true. Returning Top is conservative.

The analysis could be better than contains(PartialValue::Bottom), i.e. we could apply the Bottom case more often, but at least this way is conservative.

(e.g. Sum(1, [Bottom, foo, bar]) also "can't happen", so must result from not-yet-computed data that'll turn up later in the analysis - that is, for a PartialSum, we can discount any variant at least one of whose value's (recursively) "contains bottom", and then the PartialSum classed as containing-bottom if it has no variants left)

Would it help to rename init to default_ or something like that?

On reflection you are right, this is fine as is. Perhaps interface/names/docs could be improved. I did not realise that I should overwrite the values rather than join into them.

Oooh, but, as you suggest here, maybe we can simplify.

So there are some leaf ops that can figure out properties of their outputs even when some of the inputs are completely unknown, and I had thought that that meant we couldn't do as you suggest, but actually....maybe we can; we'll call interpret_leaf_op again later when we have non-bottom inputs. In which case we end up

interpret_leaf_op never sees any Bottom inputs

initial outputs passed to interpret_leaf_op are always PartialValue::Top

if any input is Bottom, all outputs are set to Bottom (without even calling interpret_leaf_op)

Does that actually work?!?
This is partly about being clear about when Bottom is used - i.e. only as part of initialization, as a means of bootstrapping cyclic dependencies (maybe non-cyclic depending on how "clever" ascent tries to be)

Yes I believe that works.

Everything is initialised to Bottom. Any inputs being bottom means this op is not "reachable"; i.e. it's guaranteed to never execute. Maybe it's in an unreachable block, maybe one of it's inputs comes from a panic (which should always set all outputs to bottom).

I'm not convinced passing a mutable slice with everything initialised to Top is the best way to do this, but that is an irrelevant detail here.

Ok, interpret_leaf_op no longer called if any input is Bottom. (I considered a default impl that checks row-contains-bottom but I think that if (a) Bottom can only occur during bootstrap/initialisation, and (b) the op doesn't execute unless it gets a valid input row, this is ok. The case of Conditionals being simplifiable because we know their predicate even if not all their inputs, never hits interpret_leaf_op, so I think this is OK.)

I also considered changing it to return Vec<PartialValue<V>> rather than take a mutable reference, but this does mean the callee becomes responsible for allocating a vector of the correct length; there are potentially many callees in the future, and only one caller, which this way can do the length-calculation. If you feel strongly that returning Vec<PartialValue<V>> would be better, then I'm willing to change.

doug-q · 2024-11-18T08:15:43Z

hugr-passes/src/dataflow/partial_value.rs

+            }
+            (Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) {
+                Some(h3) => {
+                    let ch = h3 != *h1;


This comparison can be expensive. I suggest try_join should be able to signal whether the self argument was changed. Perhaps

enum TryJoinResult<V> { Top, Unchanged(V), MaybeChanged(V), Changed(V) }

We should verify that try_join is not lying with debug_asserts

Well, first I changed AbstractValue::try_join to return (Self, bool) i.e. whether it's changed, thus avoiding the comparison. Rather than adding the debug_assert, though, I managed to avoid the clone (leaving nothing to compare against) by cunning use of std::mem::swap.

A large part of my brain here is saying "premature over-optimization" but...

doug-q · 2024-11-18T11:15:45Z

hugr-passes/src/dataflow/datalog.rs

+
+        // In `CFG` <Node>, basic block <Node> is reachable given our knowledge of predicates:
+        relation bb_reachable(Node, Node);
+        bb_reachable(cfg, entry) <-- cfg_node(cfg), if let Some(entry) = ctx.children(*cfg).next();


Here, the entry node is reachable when cfg has bottom inputs. I suggest we should check that cfg has no-non bottom inputs before declaring that the entry block is reachable.

Yes, fair. I think the same applies to conditional (if any non-predicate input is bottom, no cases are reachable; if the elements of any variant contain bottom, that case is unreachable), Call, DataflowBlock/ExitBlock (in the same way as conditional, i.e. gate every CF edge), and depending on semantics perhaps also DFG (i.e. if a nested DFG is a scheduling barrier)

I suggest that we should have a reachable lattice (unreachable is bottom, reachable is top) defined on every Dataflow Node. A node is reachable if it's parent is (or it's the root) and all it's inputs are non-bottom. the in_wire_value <-- out_wire_value rule should require the node is reachable.
This is not required for correctness, but I expect it will be good for efficiency.

Done by augmenting ValueRow::unpack_first (now ValueRow::unpack_first_no_bottom) and cases for CFG+DFG

doug-q · 2024-11-18T14:08:50Z

hugr-passes/src/dataflow/datalog.rs

+            func_call(call, func),
+            output_child(func, outp),
+            in_wire_value(outp, p, v);
+    };


This is more of a placeholder than a literal suggestion, but arguments of public functions (currently just "main") do need to be TOP.

Suggested change

};

// The arguments of public functions must be TOP

out_wire_value(func_i, OutgoingPort::from(p.index()), PartialValue::Top) <--

node(func_n),

if let Some(func) = ctx.get_optype(*func_n).as_func_defn(),

if func.name == "main",

input_child(func_n, func_i),

for (p,_) in ctx.out_value_types(*func_i);

};

Yes...depending on the optimization target perhaps, but this is another way in which inputs can be provided if it's a Module, rather than e.g. DFG, -rooted Hugr.

Commented that this can be done via prepopulate_out_wire for external functions. Whilst very basic, I think we should not do too much until we have settled how Hugr's are used as "libraries" (and perhaps also linked).

doug-q · 2024-11-18T14:47:08Z

hugr-passes/src/dataflow/results.rs

+        w: Wire,
+    ) -> Result<V2, Option<ExtractValueError<V, VE, SE>>>
+    where
+        V2: TryFrom<V, Error = VE> + TryFrom<Sum<V2>, Error = SE>,


Here, and elsewhere, we should use TryInto for constraints, for the same reason that we use Into over From.

That took a surprising amount of fiddling but yes, done :)

This reverts commit 1b64b4b.

doug-q · 2024-11-19T10:15:55Z

hugr-passes/src/dataflow/partial_value.rs

+                *self = other;
+                true
+            }
+            (Self::Value(h1), Self::Value(h2)) => match h1.clone().try_join(h2) {


This clone is potentially expensive and not required. We should, with enough chicanery, be able to take ownership of h1 and consume it here.

if h1 were a mutable ref we could do

=> { let mut temp = Self::top(); std::mem::swap(h1, &mut temp); if let Some(h3) = temp.try_join(h2) { *h1 = h3; }

I suggest adding the above as a comment, not solving it now.

Oh, dang, I thought this was a comment on somewhere else, I have now done this. I can undo if you'd rather but it avoids that nasty unreachable! too....

Don't undo!. I was trying to communicate that I think this is an important point but that it shouldn't hold up this PR.

Nah, not undoing now - this enabled yet more refactoring here, it avoids all the nasty awkward bits we had before with just one swap, I think this is both clearest and shortest yet as well as more efficient :)

hugrbot · 2024-11-19T11:56:57Z

This PR contains breaking changes to the public Rust API.
Please deprecate the old API instead (if possible), or mark the PR with a ! to indicate a breaking change.

cargo-semver-checks summary


--- failure trait_removed_associated_type: trait's associated type was removed ---

Description:
A public trait's associated type was removed or renamed.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/trait_removed_associated_type.ron

Failed in:
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69
associated type HugrView::Nodes, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:44
associated type HugrView::NodePorts, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:49
associated type HugrView::Children, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:54
associated type HugrView::Neighbours, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:59
associated type HugrView::PortLinks, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:64
associated type HugrView::NodeConnections, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-core/src/hugr/views.rs:69

--- failure trait_removed_associated_type: trait's associated type was removed ---

Description:
A public trait's associated type was removed or renamed.
      ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
     impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.36.0/src/lints/trait_removed_associated_type.ron

Failed in:
associated type CfgNodeMap::Iterator, previously at /home/runner/work/hugr/hugr/BASELINE_BRANCH/hugr-passes/src/nest_cfgs.rs:72

…ottom)

acl-cqc force-pushed the acl/const_fold2 branch 2 times, most recently from 1594e6f to e1c49d7 Compare August 28, 2024 17:31

acl-cqc requested a review from doug-q September 2, 2024 09:09

acl-cqc changed the title ~~DRAFT(v2?) Datalog-style constant-folding skeleton~~ feat: Dataflow analysis framework and use for constant-folding Sep 2, 2024

acl-cqc force-pushed the acl/const_fold2 branch from 718f058 to 6698b1b Compare September 2, 2024 14:42

acl-cqc force-pushed the acl/const_fold2 branch from e54c742 to 6cac41a Compare September 2, 2024 15:46

acl-cqc added 23 commits September 11, 2024 16:15

Just const_fold2 + inside that partial_value (taken from hugr_core)

c7a9d89

merge/update+fmt (ValueName for ConstInt non-compiling as ConstInt no…

ac45e53

…t Hash)

Missing imports / lints. Now running, but failing w/StackOverflow

8adaa6e

Fix tests...

098c735

* DFContext reinstate fn hugr(), drop AsRef requirement (fixes StackOverflow) * test_tail_loop_iterates_twice: use tail_loop_builder_exts, fix from #1332(?) * Fix only-one-DataflowContext asserts using Arc::ptr_eq

ValueKey using MaybeHash

706c892

tag() does not refer to self.is_compound

63bc944

ValueHandle::{is_compound,num_fields,index} => {variant_values, as_sum}

5fa7edb

Rm ValueHandle::tag, use variant_values - inefficient, presume this i…

98bf94a

…s what was meant

add variant_values, rewrite one use of outputs_for_variant

295ec32

...and the other two; remove outputs_for_variant

5c8289e

Rewrite tuple rule to avoid indexing

bf173ab

GC unused (tuple,variant)_field_value, iter_with_ports

0ae4d19

Common up via ValueRow.unpack_first

8608ba9

No DeRef for ValueHandle, just add get_type()

2dca3e9

ValueKey::{Select->Field,index->field}

51e68ea

(join/meet)_mut_unsafe => try_(join/meet)_mut with Err for conflictin…

8635474

…g len

Remove ValueHandle::variant_values - just have as_sum

80d5b86

Optimize as_sum() by returning impl Iterator not Vec

1c8be99

Machine uses PV not PartialValue

b0afa54

Parametrize PartialValue+PV+Machine by AbstractValue/Into<Value>, Con…

d09a1fe

…text interprets load_constant

Move partial_value.rs inside datalog/

af8827b

Hide PartialSum/PartialValue

4b61436

refactor: ValueRow::single_among_bottoms

8b31d8c

ConstLocation is From<Node>; move partial_from_const out to toplev, n…

33c8607

…o value_from_const

doug-q reviewed Nov 15, 2024

View reviewed changes

doug-q reviewed Nov 18, 2024

View reviewed changes

Merge commit 'origin/main^' into HEAD

97496f9

acl-cqc mentioned this pull request Nov 18, 2024

DRAFT: feat: Prototype dataflow analysis for static circuit extraction #1664

Draft

doug-q reviewed Nov 18, 2024

View reviewed changes

acl-cqc added 4 commits November 18, 2024 21:56

Generalize run to deal with Module(use main), and others; add run_lib

8b76135

Shorten the got-all-required-inputs check (build got_inputs)

c18cbea

Shorten further...not as easy to follow

1b64b4b

Revert "Shorten further...not as easy to follow"

39b8df1

This reverts commit 1b64b4b.

doug-q reviewed Nov 19, 2024

View reviewed changes

acl-cqc added 2 commits November 19, 2024 11:50

doc fixes, rename to run_library

a5d987c

Add PartialValue::contains_bottom, also row_contains_bottom

3e718fd

acl-cqc added 14 commits November 19, 2024 12:02

Don't call interpret_leaf_op if row_contains_bottom

497686a

Use row_contains_bottom for CFG+DFG, and augment unpack_first(=>_no_b…

e34c7be

…ottom)

run_library => publish_function

69a69f3

Drop publish_function, pub prepopulate_wire

57ac432

ValueRow::single_known => singleton, set

f9a9f24

try_join / try_meet return extra bool

9cc368d

shorten/common-up meet_mut + join_mut

a61fbdb

try_into_value: change bounds TryFrom -> TryInto; rename =>try_into_sum

24cce0e

Avoid a clone in try_into_sum

a590766

Optimize+shorten join_mut / meet_mut via std::mem::swap

2b2c461

refactor join_mut / meet_mut again, common-up assignment

124718d

clippy

93b1f4d

doclinks

731a3b0

prepopulate_df_inputs

7040e83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Dataflow analysis framework #1476

feat: Dataflow analysis framework #1476

acl-cqc commented Aug 28, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading

doug-q Nov 15, 2024

acl-cqc Nov 18, 2024

acl-cqc Nov 18, 2024

acl-cqc Nov 18, 2024

doug-q Nov 18, 2024

acl-cqc Nov 18, 2024

doug-q Nov 18, 2024

acl-cqc Nov 19, 2024 •

edited

Loading

doug-q Nov 18, 2024

acl-cqc Nov 19, 2024

doug-q Nov 18, 2024

acl-cqc Nov 19, 2024

doug-q Nov 19, 2024

acl-cqc Nov 19, 2024

doug-q Nov 18, 2024

acl-cqc Nov 18, 2024

acl-cqc Nov 19, 2024

doug-q Nov 18, 2024

acl-cqc Nov 19, 2024

doug-q Nov 19, 2024

doug-q Nov 19, 2024

acl-cqc Nov 19, 2024

doug-q Nov 19, 2024

acl-cqc Nov 19, 2024 •

edited

Loading

hugrbot commented Nov 19, 2024

-    };
+        // The arguments of public functions must be TOP
+        out_wire_value(func_i, OutgoingPort::from(p.index()), PartialValue::Top) <--
+            node(func_n),
+            if let Some(func) = ctx.get_optype(*func_n).as_func_defn(),
+            if func.name == "main",
+            input_child(func_n, func_i),
+            for (p,_) in ctx.out_value_types(*func_i);
+    };

feat: Dataflow analysis framework #1476

Are you sure you want to change the base?

feat: Dataflow analysis framework #1476

Conversation

acl-cqc commented Aug 28, 2024 • edited Loading

codecov bot commented Sep 2, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acl-cqc Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acl-cqc Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

hugrbot commented Nov 19, 2024

acl-cqc commented Aug 28, 2024 •

edited

Loading

codecov bot commented Sep 2, 2024 •

edited

Loading

acl-cqc Nov 19, 2024 •

edited

Loading

acl-cqc Nov 19, 2024 •

edited

Loading