Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONTRACTS] Detect loop locals with goto_rw in DFCC #8489

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

qinheping
Copy link
Collaborator

This PR improve the loop locals detection in DFCC by also detecting variables that are not written or read outside the loop.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 91.07143% with 5 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@20a1ecf). Learn more about missing BASE report.
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
...trument/contracts/dynamic-frames/dfcc_cfg_info.cpp 69.23% 4 Missing ⚠️
...ntracts/dynamic-frames/dfcc_infer_loop_assigns.cpp 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8489   +/-   ##
==========================================
  Coverage           ?   78.86%           
==========================================
  Files              ?     1727           
  Lines              ?   198516           
  Branches           ?    18496           
==========================================
  Hits               ?   156559           
  Misses             ?    41957           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 50 to 51
/// Collect identifiers that are local to this loop only
/// (excluding nested loop).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please be more verbose to clarify what "excluding nested loop" means? Perhaps even give an example? Thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generating loop information, we want to collect locals that are only local for the given loop_id but not its nested inner loops to avoid instrument a loop instruction more than once.
I remove the condition of excluding nested inner loops from this method. And erase inner loops' locals in gen_dfcc_loop_info instead.

/// Collect identifiers that are local to this loop only
/// (excluding nested loop).
/// A target is a loop local if
/// 1. it is declared inside the loop, or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a sufficient condition? Don't we also need to have it marked DEAD within the (syntactic) bounds of the loop?

Copy link
Collaborator Author

@qinheping qinheping Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to identify identifiers that are equivalent to loop locals in the sense that DFCC can safely ignore them for havocing and checking assignability.
There is no write/read outside the loop is sufficient for ignoring the target. However, DECL in GOTO is also a writing instruction as I commented below. What do you think could be a better conditions for identifying such identifiers? How about checking if a variable is read before the first write in the loop?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you might just not really care about declarations very much and instead focus on all reads/writes being within the set of instructions that make up the loop body.

int nondet_var;
int __VERIFIER_var;
int __CPROVER_var;
int nondet_var = nondet_int();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate as to why this is newly necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that if we only declare variables but not initialize them or use them outside the loop, they will be identified as loop locals by the new algorithm. And hence DFCC will not check if they are assignable according to the assigns clauses.

I think the issue here is that in C, declaring a variable also initializes them with a nondet value. And we usually just use declarations in harness for deterministically initializing variables. To avoid being identified as loop locals and ignored by DFCC, I explicitly initialize them with nondet.

Shouldn't a C declaration be converted to a GOTO DECL and a GOTO ASSIGN with nondet?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that if we only declare variables but not initialize them or use them outside the loop, they will be identified as loop locals by the new algorithm. And hence DFCC will not check if they are assignable according to the assigns clauses.

Is there anything wrong with taking this view? That is, if a variable is declared outside the loop yet never read outside the loop then it is indistinguishable from one that is declared inside the loop, I'd say?!

I think the issue here is that in C, declaring a variable also initializes them with a nondet value. And we usually just use declarations in harness for deterministically initializing variables. To avoid being identified as loop locals and ignored by DFCC, I explicitly initialize them with nondet.

Shouldn't a C declaration be converted to a GOTO DECL and a GOTO ASSIGN with nondet?

One could do that, and I have made such an attempt in #35. There just wasn't an obvious benefit to doing so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything wrong with taking this view? That is, if a variable is declared outside the loop yet never read outside the loop then it is indistinguishable from one that is declared inside the loop, I'd say?!

In C program, it is distinguishable. Because declaration is also a havocing in C. In the tests I updated in this PR, they expect the variables that are only declared outside the loop but no read/write outside the loop to be checked by DFCC inside the loop. That's why I have to explicitly assign to them to avoid them being identified as loop locals.

I think adding a condition that a variable should also not be read before first write in the loop can make them really indistinguishable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the issue here is that also they are not used outside the loops, but they are used in loop contracts, which will be instrumented as R/W outside the loop. So, a loop local should also not be used in loop contracts.

regression/contracts-dfcc/invar_check_break_fail/main.c Outdated Show resolved Hide resolved
@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Nov 1, 2024

Hi,
this change is motivated by the fact that GOTO programs generated from MIR by Kani look different to that generated from C by goto-cc. The rust compiler systematically moves all place declarations to the top of a MIR function, which means we can't really know if a place stems from a loop-local declaration or not.

Now we are trying to infer if a GOTO variable, representing a MIR place, which itself represents a Rust variable, is a loop-local variable at the Rust level by checking if a GOTO variable DECL/DEAD outside of a loop is read or written outside the loop or not.

Initially the concept of loop local was that the identifier really only exists (DECL/DEAD) within the connected component that defines the loop. Excluding loop locals from frame condition checking is sound and is the the thing to do because these variables can't even be used outside of the loop instructions and hence are not part of the frame of the loop (frame: set of memory locations that exists before/after the loop executes). It kind of feels wrong to bend this (sorta) well-defined notion to make up for the destructive compilation performed by rustc.

Could we instead try to have this fixed upstream ? i.e. use the scopes info produced by rustc or storage_live/storage_dead information at the MIR level to produce GOTO code where DECL/DEAD declarations enclose the actual lifetime of the variable as tightly as possible ?

@tautschnig
Copy link
Collaborator

Could we instead try to have this fixed upstream ? i.e. use the scopes info produced by rustc or storage_live/storage_dead information at the MIR level to produce GOTO code where DECL/DEAD declarations enclose the actual lifetime of the variable as tightly as possible ?

This is what we used to do in Kani up until model-checking/kani#3164 (which became part of model-checking/kani#2995). That problem that we run into when producing declarations is that MIR also has a lot of non-linear control flow via goto statements. The original scopes at Rust level are broken up into various chunks and we end up with control-flow join points that then require merging different instantiations of a variable. We would really need to restore the original (Rust-level) code structure from a MIR that doesn't follow those rules.

@qinheping qinheping force-pushed the better_loop_locals_detect branch 2 times, most recently from 1d5351e to 656ddfa Compare November 1, 2024 17:12
@qinheping
Copy link
Collaborator Author

@tautschnig , after adding the condition that a loop local should not appear in loop contracts, the current detection algorithm is sound for C's declaration now.

In the regression tests updated in this PR, we want to enforce DFCC to check assignability of some variables even if it is sound and safe to ignore them. I added initialization/invariants in those tests so that ignoring the variables is unsound and trigger the checks.

@kroening
Copy link
Member

kroening commented Nov 2, 2024

I can see good value in a GOTO pass that tightens variable lifetimes, which might fix the MIR issue.

@feliperodri feliperodri added the Code Contracts Function and loop contracts label Nov 2, 2024
@tautschnig
Copy link
Collaborator

I can see good value in a GOTO pass that tightens variable lifetimes, which might fix the MIR issue.

Agreed, but non-trivial given all the non-linear control flow.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with two additional questions.

@@ -12,7 +12,7 @@ void main()
x1 = &z1;

while(y1 > 0)
__CPROVER_loop_invariant(*x1 == __CPROVER_loop_entry(*x1))
__CPROVER_loop_invariant(y1 >= 0 && *x1 == __CPROVER_loop_entry(*x1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this (and the other changes in this file) now necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for the same reason why we add nondet_int. It is safe to ignore y1 in DFCC instrumentation. But we want to trigger the checks for y1 as described in the .desc file. Adding a new invariant clause will trigger all DFCC checks for y1.

Comment on lines 301 to 307
from_integer(0, size_type()),
false_exprt(),
false_exprt(),
false_exprt(),
false_exprt(),
true_exprt(),
true_exprt(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why did you have to change this one call to write_set_create_call?
  2. It seems that, after this change, there is no further call to the 3-parameter version of write_set_create_call. Can you please remove it?

@qinheping qinheping merged commit 018c61c into diffblue:develop Nov 4, 2024
37 of 38 checks passed
@qinheping qinheping deleted the better_loop_locals_detect branch November 4, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Contracts Function and loop contracts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants