Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Bounds declarations

David Tarditi edited this page May 8, 2018 · 10 revisions

Overview

Programmers declare bounds for variables and members with _Array_ptr, _Nt_array_ptr, or checked array types as part of variable and member declarations. The bounds for a variable or member are declared by following the declarator for the variable or member with a ':' and a bounds expression.
The bounds declaration precedes any initializer.

There are 4 kinds of bounds expressions:

  1. count(e): the number of array elements that a pointer points to.
  2. byte_count(e): the number of bytes that a pointer points to.
  3. bounds(lower, upper): the lower and upper bound for a pointer. For an _Array_ptr variable, memory at or above _lower and below upper can be accessed. For an _Nt_array_ptr variable, the element beginning at upper can be read or a null value can be stored there.
  4. bounds(unknown): the pointer has unknown bounds and cannot be used to access memory. Count and bytecount are syntactic sugar for bounds.

Bounds expressions are evaluated as part of bounds checking memory accesses, so they are not allowed to modify memory.
They cannot be or contain assignment expressions, pre/post increment/decrement expressions, or function calls.

When a bounds expression is declared for a parameter, it can refer to any other parameter in the parameter list. Type checking and other well-formedness conditions are checked after the entire parameter list has been parsed.

Programmers can also declare bounds for the return value of a function. The return bounds are declared by following the function declarator with a ':' and a bounds expression. The return bounds can refer to any parameter. They can also refer to the special symbol _Return_value, which is the return valoe of the function.

Examples

// Function taking pointer to 5 integers.
int f(_Array_ptr<int> p : count(5));

// Equivalent declaration using byte_count.
int f(_Array_ptr<int> p : byte_count(5 * sizeof(int));

// Function taking pointer to len integers.
int g(_Array_ptr<int> p : count(len), int len);

// Equivalent declaration using an array tpe.
// Parameters with array types are treateed as
// having pointer types.
int g(int p _Checked[] : count(len), int len);

// Function returning pointer to len integers.
_Array_ptr<int> h(int len) : count(len);

// Global variable pointing to 5 integers, initialized to
// null.
_Array_ptr<int> r : count(5) = 0;  

// Variable length array declaration.
struct VarArr {
    int len;
    int arr _Checked[] : count(len);
};

## Narrowing and widening of bounds

It is OK to narrow the bounds of a pointer.   This is useful when a function only
operates over a portion of data.  Consider a function that checks that integrity
of data in a buffer.  The checksum is stored in the first 4 bytes of a buffer and the
data is in the remainder of the buffer.

int convert_to_int(Array_ptr start : count(len), int len);

int checksum(_Array_ptr start : bounds(start, end), _Array_ptr end);

int verify_data(_Array_ptr buf : bounds(buf, end), _Array_ptr end) { if (buf + 4 > end) return 0;

int target = convert_to_int(buf, 4); int computed = checksum(buf + 4, end); return computed == target; }

Bounds of _Nt_array_ptr pointers can be widened if the value at the upper bound
is non-null (note that this widening is not yet implemented in the compiler!).

_Nt_array_ptr p : count(0); if (*p) { _Nt_array_ptr r : count(1) = p; ... }


## Bounds declaration checking

Bounds declarations are checked for validity at compile time.  A bounds
declaration for a variable is valid:
* If the declared bounds are within the bounds of an object at runtime.
* Or the pointer is null.

Bounds declarations are checked at variable declarations, after
assignments, and at function calls.  The validity of the
bounds declaration must follow from the constructed
bounds for expressions and additional program 
facts gathered by dataflow analysis.  The facts may
include equality facts as well as relational facts.

If the compiler cannot prove the validity of the
bounds, a programmer can insert a dynamic runtime check
to avoid a compiler warning.   The static checking
algorithm that Checked C uses is incomplete, and there
will be facts about bounds that are true but cannot 
be proved at compile-time.

For a variable declaration, the compiler constructs
the bounds for the initializer expression  It then
checks that those bounds imply declared bounds:

_Array_ptr x : count(10) = ... _Array_ptr y : count(5) = x;


For assignments, the compiler checks that the bounds
of variables and members are valid after expression
statements.  In the simplest case, the bounds of
the right-hand side expression of an assignment
imply the bounds for a variable or a member.

_Array_ptr x : count(10) = ... _Array_ptr y : count(5) = ... y = x;

C allows multiple assignments within a single
statement.  In more complex cases, a variable
with a bounds and variables used in bounds are
updated.

_Array_ptr x : count(x_len) = ... _Array_ptr y : count(y_len) = ...

x = y , x_len = y_len;


For function calls, the compiler checks that the bounds of
argument expressions imply parameter bounds.  The compiler
does this by "translating" the parameter bounds into
bounds at the call site.   It substitutes the actual
arguments into the parameter bounds.  If the arguments
have side-effects, the compiler introduces (symbolic)
temporaries.

### Reasoning about bounds expressions.

The bounds declaration checking algorithm reasons about:
1. Expressions that are equivalent (always compute the same value.)
2. Constant-sized ranges.  These are bounds that can be expressed
  in the form (e1 + c, e1 + d), where c and d compile-time constants.

For determining expression equivalence, the compiler starts with a list
of sets of expressions that are known to evaluate to equal
values (for example, given the
assignment `x = y`, the compiler knows after the assignment that x == y).
Two expressions are equivalent if:
1. They are syntactically identical.
2. They are both in the same set of expressions known to be equal.
3. They constant-fold to the same value.
4. The expressions both use the same operator, each of their
operands is equivalent, and the corresponding operators have
matching types.
5. One expression is parenthesized and its subexpression is equivalent
to the other expression.
6. One expression is a value-preserving operation and its subexpression is
equivalent to the other expression.  Value-preserving operations include:
- Parenthesis.
- C-style casts that preserve bitwise identity (such as casts between
  pointer types).   
- `*` and `&` applied to function and pointer types.

To see if one constant-sized range (e1 + c1, e1 + d1) 
implies the validity (or falseness)
of another constant-sized range (e2 + c2 + d2), the compiler checks if the base expressions
are equivalent.  If they are not equivalent, no conclusion can be
drawn about truth or falseness.  If they are equivalent, the
compiler checks that c1 <= c2 and d1 >= d2.

### Bounds declaration checking after assignments

To check bounds declarations after assignments, we begin with
bounds that are true in the program state before the
assignments (the "before state").
We then translate that information to the program state
after the assignments .  That way we have bounds in
the "after" state that we can use to reason about the
validity of bounds declarations there.

This done in 4 steps:
1. Gather the set of required bounds for any variables or members that
are assigned to in the statement.   If an assignment is made
to a variable or member with bounds, the bounds are included
in the set.  Also, if an assignment is made to any variable used
in a in-scope bounds declaration, those bounds are included
in the set.  These bounds must be true about the "after" program
state.
2. Compute the bounds for the right-hand sides of any
assignments.  These are bounds that true in the "before" 
program state.
3. Update the bounds obtained in step 2 based on the
effects of assignments in the statement.  These are now bounds
true in the "after" program state.
4. Check that the updated bounds along with dataflow
facts imply the required bounds.

Step 3 involves two steps:
1. Some assignments have right-hand sides that are
invertible expressions.  The prior value of the expression can be
calculated from the new value.  For example, given `x = x + 1`,
the prior value of `x` is `x - 1`. For those assignments,
the original variable is replaced with the inverted expression.
2. For all other assignments, any bounds using a variable
updated by those assignment are changed to `bounds(unknown)`