Add tests to document current behavior of post-instr in branches by mbarbin · Pull Request #435 · aantron/bisect_ppx

mbarbin · 2024-02-12T19:07:06Z

This PR adds new test cases to monitor the current behavior of branch
instrumentation in Bisect_ppx. The test cases focus on complex scenarios
involving if-then-else and match branches discussed in #434.

The aim is not to modify the existing behavior directly, but to establish a
baseline for future changes.

By adding these tests, we can monitor how future changes to Bisect_ppx may alter
these behaviors (TDD).

aantron · 2024-02-13T11:46:50Z

test/instrument/control/if.t

+
+Where there's a raise or a failwith guarded by a conditional, the
+branch itself is instrumented, but the raising expression is not
+post-instrumented...


What is the incremental value of having this test over the tests in apply/special.t, including

bisect_ppx/test/instrument/apply/special.t

Line 293 in e302656

> let _ = failwith (print_endline "foo"; "bar")

? Is there any need to test this in interaction with if-expressions?

I don't believe there's a need for this test from the standpoint of the current
implementation. I do see its value in terms of future-proofing and documenting.

The if cond then raise pattern is quite common. As it stands, one must be
cautious with how the raising expression is written. It's important to note that
even seemingly innocuous refactoring, which shouldn't affect coverage, could
potentially introduce non-visitable coverage points. This can be quite
surprising.

Consider the following example:

-| if cond then raise Not_found +| if cond then raise_s [%sexp "Key not found", { key : Key.t}]

This results in the insertion of non-visitable point 1:

-| if cond then raise Not_found +| if cond then (__bisect_post__visit 1 (raise_s [%sexp "Key not found", { key : Key.t}])) ``

aantron · 2024-02-13T11:49:03Z

test/instrument/control/if.t

+For example, if you use `invalid_arg`, or a raising helper, its
+application will be post-instrumented (unless all of its arguments are
+labeled !?), resulting in a code location that can never be visited,
+by design. These are currently acknowledged limitations.


This is an acknowledged and inevitable limitation because there is no way for Bisect to know that a specific identifier refers to a function that always or never raises. In fact, it doesn't know that for the identifiers failwith or invalid_arg either, as those can be redefined and shadowed by the user, including by another module or a third-party library that is not even processed by Bisect during compilation of the user's particular project. I don't see that we need a test for this -- the actual behavior is covered by the normal apply tests.

(Raises or diverges)

there is no way for Bisect to know that a specific identifier refers to a
function that always or never raises

I agree. It's fair to say that "a strategy to limit non-visitable points, which
relies partly on identifiers, will inevitably encounter these limitations."
However, if we were to discover an alternative approach (assuming one exists),
we might still see improvements in this test. In other words, the insertion of
post-instrumentation might not be inevitable when considering the range of
possible instrumentation strategies.

(Raises or diverges)

Btw, I didn't understand the part where you said "or diverges".

(Raises or diverges)

Btw, I didn't understand the part where you said "or diverges".

Post-instrumentation checks that a function evaluates to a value. The two ways that it doesn't in OCaml are when it raises an exception or when it diverges.

Those are the two ways that the out-edge of aPexp_apply may not be followed.

Does "diverging" encompass non-termination and potentially the effects in OCaml 5? Something else?

Regarding the effects, I'm optimistic that their influence on the strategy will
be similar to that of exceptions. I hope this won't necessitate too big of an
exploration of the OCaml 5 runtime 🤯. Thank you again for your explanation!

Yes, "diverging" is non-termination. OCaml 5 effects are supposed to always be returned from exactly once, so they should not affect control flow as visible to Bisect, although this is not enforced, and for effects that correspond to async computation the "exactly once" might be later than program termination, so in that sense they behave like exceptions.

aantron · 2024-02-13T11:50:52Z

test/instrument/control/if.t

+If you try and disable post-instrumentation with a `coverage off`
+directive, the branch pre-instrumentation is removed as well,
+resulting in losing the ability to control that the branch itself is
+visited.


Could you move this to attribute.t? That is the place for testing [@coverage off]. It already includes cases for interaction of [@coverage off] with multiple language constructs in case.

I appreciate your suggestion to add tests to attribute.t, which I had
overlooked. I will proceed to incorporate new tests there to demonstrate how to
disable coverage in branches (same PR ok?).

Despite the presence of branch coverage testing in attribute.t, I advocate for
retaining this specific part of the control test here. It serves to illustrate
that this method does not provide a viable workaround for the issue at hand.
This was in response to your comment in #433:

I believe in your case you should suppress instrumentation of the out-edge
with [@coverage off], though I don't immediately recall whether the way that
is implemented will also cause the branch instrumentation to also be
supressed.

This might come across more as literate testing (akin to a tutorial, perhaps?)
rather than a pure behavior test. I've attempted to construct a small narrative
around the added tests. However, if you have strong reservations, I'm open to
removing this section entirely.

aantron · 2024-02-13T11:52:04Z

test/instrument/control/match.t

    | () ->
        ___bisect_visit___ 2;
        print_endline "bar"
+


Analogous questions for the match.t changes as for the if.t changes.

mbarbin · 2024-02-13T12:50:23Z

Thank you for taking the time to review the PR and for your comments! I really
appreciate your detailed feedback and am working on responding to each of your
points individually.

Overall, my intuition centers around the hypothesis that the
post-instrumentation for branches would be better attributed to the expression
that encompasses the entire control structure, and doing so recursively if the
expression itself is a branch of a further enclosing control structure.

In other words, the post-instrumentation would never be produced for each case
individually, also simplifying the current special cases that exists there. This
might necessitate some work at the outer level (such as adding (post or
pre)-instrumentation), although I haven't fully worked out the details yet.

I have the hunch that doing something along these lines would avoid, in more
cases, the insertion of coverage points that are not visitable by design,
without requiring annotations or otherwise compromising on the quality of the
coverage testing for the branches themselves.

In this context, yes I indeed believe in the need for dedicated tests to ensure
coverage for these specific scenario. The "tests" I propose to add are not
traditional tests aimed at preserving a specific behavior. Instead, they serve
as precise documentation for the current behavior, while also providing coverage
for a behavior I am hoping will be modified in future work. My hope is that,
over time, the behavior for control structures will begin to diverge from the
general case, and be reflected in these tests (TDD).

In #434, you said:

The post-instrumentation of application expressions is orthogonal to whether
they are in if expressions.

I agree. I think this is part of what I would like to challenge, for the
terminal expressions of each branches when they are apply nodes.

I acknowledge that I haven't fully realized the details of the concept yet, and
I understand your reluctance to distinguish if-then-else or match from
general cases if you do not buy in the hope to improve behavior there.

As a compelling argument in favor of these specific tests, examining the
processing of these control structures has significantly enhanced my
understanding of the current behavior and constraints of the instrumentation.
This newfound insight has ignited my creativity in devising ways to circumvent
these limitations in my project, and continue increasing reported coverage.

Perhaps you find the cost of adding such tests outweighs the potential benefits,
considering the risk that such improvements might not be feasible? If you prefer
we can wait to see more progress on an overall proposal before deciding on
whether to pursue merging these?

aantron · 2024-02-13T14:18:24Z

Perhaps you find the cost of adding such tests outweighs the potential benefits,
considering the risk that such improvements might not be feasible? If you prefer
we can wait to see more progress on an overall proposal before deciding on
whether to pursue merging these?

I would suggest to keep the tests that are redundant right now locally, and to include them as part of the PR if you do suggest an alternative instrumentation strategy for if+Pexp_apply. Normally when I review such PRs, I manually run the new tests with the base branch to observe the difference with the PR, so I will be able to see the difference. That is, if I don't spot it by inspection, which is quite likely in this case, I think.

aantron · 2024-02-13T14:31:36Z

Consider the following example:

-| if cond then raise Not_found
+| if cond then raise_s [%sexp "Key not found", { key : Key.t}]

This results in the insertion of non-visitable point 1:

-| if cond then raise Not_found
+| if cond then (__bisect_post__visit 1 (raise_s [%sexp "Key not found", { key : Key.t}]))
``

It may be easier to address this by extending the --exclusions option to allow the user to define a list of identifiers to exclude from coverage as if they were built-ins when they appear in Pexp_apply.

mbarbin · 2024-02-13T14:33:25Z

I would suggest to keep the tests that are redundant right now locally, and to include them as part of the PR if you do suggest an alternative instrumentation strategy for if+Pexp_apply. Normally when I review such PRs, I manually run the new tests with the base branch to observe the difference with the PR, so I will be able to see the difference. That is, if I don't spot it by inspection, which is quite likely in this case, I think.

Understood. I am more accustomed to first publishing tests and then reviewing
changes in subsequent updates. However, I'm more than willing to adapt as
necessary. I'll transition the PR to the draft stage as I dedicate more time to
the strategy. Thank you!

aantron · 2024-02-13T14:35:01Z

Understood. I am more accustomed to first publishing tests and then reviewing
changes in subsequent updates. However, I'm more than willing to adapt as
necessary. I'll transition the PR to the draft stage as I dedicate more time to
the strategy. Thank you!

Another way to do this is to open the PR with the tests in the first commit and the other changes afterwards, with the changes in test output included in the later commits. I suppose you are potentially turning this PR into that right now :) Thank you.

mbarbin · 2024-11-18T12:04:02Z

Hi @aantron . I hope all is well with you! I was doing a little pass over my opened PRs and decided it was time to break radio silence on this one 😃 sorry for not being more active on it.

By the way, I wanted to correct something in my previous comments : a friend of mine pointed out that my use of the term TDD was confusing for what I was trying to achieve, and showed me the wiki page for Characterization_test, which I didn't know about before. I think this more closely resemble the approach I wanted to take with bisect_ppx.

I have been using it for a while now, and I am still confused about the intention for the current instrumentation strategy in most cases for out-edge instrumentation. In my experience, the instrumentation frequently either adds unvisitable-by-design coverage points, or forces you to write the code in such a way that the raising calls are in tail position, which oftentimes feels arbitrary for the reviewer who is not aware that this is motivated by really non-trivial characteristics of the coverage tool.

Do you know if bisect_ppx was designed to be a coverage tool from the beginning, or was it a code base that was turned into one at a later point?

At any rate, I don't feel I understand enough the current behavior to be proposing actual changes. I need first to have some way to monitor the current behavior. That being said, I understand from our exchange that this may not be something that has its place directly within this repository. So I thought I would try and develop a little collection of characterization tests, in my own time, in an external repo. Maybe then I can revisit with a better understanding, or simply try and adjust my uses by following how my characterization tests are affected by upgrades to new versions of bisect_ppx.

The other thing that doesn't make me very comfortable is that I don't understand why no other user is reporting these limitations. I am guessing that the tools isn't really used a lot, or perhaps in such a way that people don't pay close attention to these unvisitable-points. I'm happy to wait and see which direction is the community going to go and how the coverage part of the roadmap https://ocaml.org/tools/platform-roadmap#w21-run-tests will actually be implemented. I can revisit when more people are on board. Thanks a lot!

Add tests to document current behavior of post-instr in branches

5eadd43

mbarbin mentioned this pull request Feb 12, 2024

Question on out-edge instrumentation of if-then-else and match cases #434

Open

aantron reviewed Feb 13, 2024

View reviewed changes

mbarbin marked this pull request as draft February 13, 2024 14:33

mbarbin closed this Nov 18, 2024

Uh oh!

Conversation

mbarbin commented Feb 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mbarbin commented Feb 13, 2024

Uh oh!

aantron commented Feb 13, 2024

Uh oh!

aantron commented Feb 13, 2024

Uh oh!

mbarbin commented Feb 13, 2024

Uh oh!

aantron commented Feb 13, 2024

Uh oh!

mbarbin commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants