Add tests to document current behavior of post-instr in branches#435
Add tests to document current behavior of post-instr in branches#435mbarbin wants to merge 1 commit intoaantron:masterfrom
Conversation
|
|
||
| Where there's a raise or a failwith guarded by a conditional, the | ||
| branch itself is instrumented, but the raising expression is not | ||
| post-instrumented... |
There was a problem hiding this comment.
What is the incremental value of having this test over the tests in apply/special.t, including
bisect_ppx/test/instrument/apply/special.t
Line 293 in e302656
if-expressions?
There was a problem hiding this comment.
I don't believe there's a need for this test from the standpoint of the current
implementation. I do see its value in terms of future-proofing and documenting.
The if cond then raise pattern is quite common. As it stands, one must be
cautious with how the raising expression is written. It's important to note that
even seemingly innocuous refactoring, which shouldn't affect coverage, could
potentially introduce non-visitable coverage points. This can be quite
surprising.
Consider the following example:
-| if cond then raise Not_found
+| if cond then raise_s [%sexp "Key not found", { key : Key.t}]This results in the insertion of non-visitable point 1:
-| if cond then raise Not_found
+| if cond then (__bisect_post__visit 1 (raise_s [%sexp "Key not found", { key : Key.t}]))
``| For example, if you use `invalid_arg`, or a raising helper, its | ||
| application will be post-instrumented (unless all of its arguments are | ||
| labeled !?), resulting in a code location that can never be visited, | ||
| by design. These are currently acknowledged limitations. |
There was a problem hiding this comment.
This is an acknowledged and inevitable limitation because there is no way for Bisect to know that a specific identifier refers to a function that always or never raises. In fact, it doesn't know that for the identifiers failwith or invalid_arg either, as those can be redefined and shadowed by the user, including by another module or a third-party library that is not even processed by Bisect during compilation of the user's particular project. I don't see that we need a test for this -- the actual behavior is covered by the normal apply tests.
There was a problem hiding this comment.
there is no way for Bisect to know that a specific identifier refers to a
function that always or never raises
I agree. It's fair to say that "a strategy to limit non-visitable points, which
relies partly on identifiers, will inevitably encounter these limitations."
However, if we were to discover an alternative approach (assuming one exists),
we might still see improvements in this test. In other words, the insertion of
post-instrumentation might not be inevitable when considering the range of
possible instrumentation strategies.
There was a problem hiding this comment.
(Raises or diverges)
Btw, I didn't understand the part where you said "or diverges".
There was a problem hiding this comment.
(Raises or diverges)
Btw, I didn't understand the part where you said "or diverges".
Post-instrumentation checks that a function evaluates to a value. The two ways that it doesn't in OCaml are when it raises an exception or when it diverges.
There was a problem hiding this comment.
Those are the two ways that the out-edge of aPexp_apply may not be followed.
There was a problem hiding this comment.
Does "diverging" encompass non-termination and potentially the effects in OCaml 5? Something else?
Regarding the effects, I'm optimistic that their influence on the strategy will
be similar to that of exceptions. I hope this won't necessitate too big of an
exploration of the OCaml 5 runtime 🤯. Thank you again for your explanation!
There was a problem hiding this comment.
Yes, "diverging" is non-termination. OCaml 5 effects are supposed to always be returned from exactly once, so they should not affect control flow as visible to Bisect, although this is not enforced, and for effects that correspond to async computation the "exactly once" might be later than program termination, so in that sense they behave like exceptions.
| If you try and disable post-instrumentation with a `coverage off` | ||
| directive, the branch pre-instrumentation is removed as well, | ||
| resulting in losing the ability to control that the branch itself is | ||
| visited. |
There was a problem hiding this comment.
Could you move this to attribute.t? That is the place for testing [@coverage off]. It already includes cases for interaction of [@coverage off] with multiple language constructs in case.
There was a problem hiding this comment.
I appreciate your suggestion to add tests to attribute.t, which I had
overlooked. I will proceed to incorporate new tests there to demonstrate how to
disable coverage in branches (same PR ok?).
Despite the presence of branch coverage testing in attribute.t, I advocate for
retaining this specific part of the control test here. It serves to illustrate
that this method does not provide a viable workaround for the issue at hand.
This was in response to your comment in #433:
I believe in your case you should suppress instrumentation of the out-edge
with [@coverage off], though I don't immediately recall whether the way that
is implemented will also cause the branch instrumentation to also be
supressed.
This might come across more as literate testing (akin to a tutorial, perhaps?)
rather than a pure behavior test. I've attempted to construct a small narrative
around the added tests. However, if you have strong reservations, I'm open to
removing this section entirely.
| | () -> | ||
| ___bisect_visit___ 2; | ||
| print_endline "bar" | ||
|
|
There was a problem hiding this comment.
Analogous questions for the match.t changes as for the if.t changes.
|
Thank you for taking the time to review the PR and for your comments! I really Overall, my intuition centers around the hypothesis that the In other words, the post-instrumentation would never be produced for each case I have the hunch that doing something along these lines would avoid, in more In this context, yes I indeed believe in the need for dedicated tests to ensure In #434, you said:
I agree. I think this is part of what I would like to challenge, for the I acknowledge that I haven't fully realized the details of the concept yet, and As a compelling argument in favor of these specific tests, examining the Perhaps you find the cost of adding such tests outweighs the potential benefits, |
I would suggest to keep the tests that are redundant right now locally, and to include them as part of the PR if you do suggest an alternative instrumentation strategy for |
It may be easier to address this by extending the |
Understood. I am more accustomed to first publishing tests and then reviewing |
Another way to do this is to open the PR with the tests in the first commit and the other changes afterwards, with the changes in test output included in the later commits. I suppose you are potentially turning this PR into that right now :) Thank you. |
|
Hi @aantron . I hope all is well with you! I was doing a little pass over my opened PRs and decided it was time to break radio silence on this one 😃 sorry for not being more active on it. By the way, I wanted to correct something in my previous comments : a friend of mine pointed out that my use of the term I have been using it for a while now, and I am still confused about the intention for the current instrumentation strategy in most cases for out-edge instrumentation. In my experience, the instrumentation frequently either adds unvisitable-by-design coverage points, or forces you to write the code in such a way that the raising calls are in tail position, which oftentimes feels arbitrary for the reviewer who is not aware that this is motivated by really non-trivial characteristics of the coverage tool. Do you know if At any rate, I don't feel I understand enough the current behavior to be proposing actual changes. I need first to have some way to monitor the current behavior. That being said, I understand from our exchange that this may not be something that has its place directly within this repository. So I thought I would try and develop a little collection of characterization tests, in my own time, in an external repo. Maybe then I can revisit with a better understanding, or simply try and adjust my uses by following how my characterization tests are affected by upgrades to new versions of bisect_ppx. The other thing that doesn't make me very comfortable is that I don't understand why no other user is reporting these limitations. I am guessing that the tools isn't really used a lot, or perhaps in such a way that people don't pay close attention to these unvisitable-points. I'm happy to wait and see which direction is the community going to go and how the coverage part of the roadmap https://ocaml.org/tools/platform-roadmap#w21-run-tests will actually be implemented. I can revisit when more people are on board. Thanks a lot! |
This PR adds new test cases to monitor the current behavior of branch
instrumentation in Bisect_ppx. The test cases focus on complex scenarios
involving if-then-else and match branches discussed in #434.
The aim is not to modify the existing behavior directly, but to establish a
baseline for future changes.
By adding these tests, we can monitor how future changes to Bisect_ppx may alter
these behaviors (TDD).