Skip to content

Commit

Permalink
Merge pull request #440 from anshumanmohan/patch-4
Browse files Browse the repository at this point in the history
Minor grammar and other catches
  • Loading branch information
sampsyo authored Apr 14, 2024
2 parents fe971f3 + 9d0629b commit ab5ddbf
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions content/blog/2022-05-10-compcert/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The author provides five ways to think about a compiler from a semantic perspect
4. $ S \vDash \text{Spec} \Rightarrow C \vDash \text{Spec} $
5. $ S \mathtt{ safe} \Rightarrow C \mathtt{ safe} $

These are not interchangeable, but the author explains how they relate to each other and points out features (such as transitivity) that they have in common. For a taste, property 2 says that, if none of $S$'s possible behaviors are "wrong", then any behavior observed during the execution of $C$ must have also been observed during the execution of $S$. Property 3 additionally assumes that the programs and their execution environments are deterministic: when we say $S \Downarrow B$, the behavior $B$ is uniquely determined. Intuitively, property 3 says that, for all behaviors that are not "wrong", if $S$ demonstrates that behavior, so will $C$. This is the property that CompCert actually establishes in its proofs. The choice of property 3 makes sense:
These are not interchangeable, but the author explains how they relate to each other and points out features, such as transitivity, that they have in common. For a taste, property 2 says that, if none of $S$'s possible behaviors are "wrong", then any behavior observed during the execution of $C$ must have also been observed during the execution of $S$. Property 3 additionally assumes that the programs and their execution environments are deterministic: when we say $S \Downarrow B$, the behavior $B$ is uniquely determined. Intuitively, property 3 says that, for all behaviors that are not "wrong", if $S$ demonstrates that behavior, so will $C$. This is the property that CompCert actually establishes in its proofs. The choice of property 3 makes sense:
* CompCert does indeed work in a deterministic setting, so there is no need for it to deal with the messiness of nondeterministic behavior.
* Establishing property 3 is easier than establishing some others: in a mechanized setting like Coq, one can `intro`duce (1) some specific behavior $B$, (2) the assumption $B \notin \mathtt{wrong}$, and (3) the assumption $S \Downarrow B$, and then perform structural induction on the execution of $S$.

Expand All @@ -59,7 +59,7 @@ Additionally, we remarked in class that property 4 is a helpful way to appreciat
#### Three Ways of Getting There

Having established the overall goal of a trustworthy compiler, the the author surveys three ways to arrive at this trust.
In the discussion below, the compiler $Comp$ is a total function on source programs $S$. It either executes without going wrong, giving $\mathtt{OK}(C)$ where $C$ is the compiled program, or goes wrong and gives an $\mathtt{error}$. When we write $S \approx C$, we mean that any of the five notions of trust can be substituted in.
In the discussion below, the compiler $Comp$ is a total function on source programs $S$. It either executes without going wrong, returning $\mathtt{OK}(C)$ where $C$ is the compiled program, or goes wrong and gives an $\mathtt{error}$. When we write $S \approx C$, we mean that any of the five notions of trust can be substituted in.


1. Verify the compiler:
Expand Down Expand Up @@ -91,12 +91,12 @@ That is: if CompCert compiles program $S$ without error into program $C$, and pr

The CompCert compiler is composed of 14 passes that go through eight intermediate languages.
The source language of CompCert is Clight, a large subset of the C programming language. Although Clight omits several C features such as extended-precision arithmetic and goto statements, it turns out to be sufficient for writing critical embedded software.
At the time when this paper was written, the target language of CompCert was the PowerPC assembly language. This translation from Clight to PowerPC was formally verified. However, some parts of the compiler were not verified yet including parsing, assembling, and linking. CompCert's intermediate languages are described below in more detail.
At the time when this paper was written, the target language of CompCert was the PowerPC assembly language. This translation from Clight to PowerPC was formally verified. However, some parts of the compiler, including parsing, assembling, and linking, were not verified. CompCert's intermediate languages are described in more detail below.

* **C#minor** is a typeless variant of Clight. To go from Clight to C#minor, normal arithmetic operators are replaced by distinct arithmetic operators for integers, pointers, and floats. C loops are replaced by infinite loops (blocks and multi-level exits from enclosing blocks are added).
* **Cminor** is very similar to C#minor but omits the & operator, meaning addresses cannot be taken.
* **CminorSel** is the target language of the instruction selection pass which involves a bottom-up rewriting of Cminor expressions.
* **RTL** is a register transfer language which is an IR that is very close to assembly and represents data-flow at the register-transfer level. Control is represented as a control-flow graph (CFG), and instructions operate over pseudo-registers. Optimizations based on dataflow analyses can be implemented at this stage. Some optimizations that have been implemented and integrated include constant propagation and common subexpression elimination.
* **RTL** is a register transfer language: an IR that is very close to assembly and represents data-flow at the register-transfer level. Control is represented as a control-flow graph (CFG), and instructions operate over pseudo-registers. Optimizations based on dataflow analyses can be implemented at this stage. Some optimizations that have been implemented and integrated include constant propagation and common subexpression elimination.
* **LTL** is similar to RTL except pseudo-registers are replaced by hardware registers or abstract stack locations after register allocation is performed.
* **LTLin** is the result of CFG linearization which produces a list of instructions with labels and branches.
* **Linear** is the next IR after LTLin. Spill and reload instructions are placed around instructions that reference pseudo-registers allocated to stack locations. Move instructions are inserted to enforce calling conventions.
Expand All @@ -105,15 +105,15 @@ At the time when this paper was written, the target language of CompCert was the

### Proof of Correctness

The source, intermediate, and target languages of CompCert all have formally defined semantics. The Coq proof assistant was used to prove that the transformations between these languages preserve semantics. For example, the semantics of C#minor is typeless and features infinite loops only. The transformation from Clight to C#minor needs to not only fit the Clight program into these restricted semantics, but also ensure the semantics preservation of behavior.
The source, intermediate, and target languages of CompCert all have formally defined semantics. The Coq proof assistant was used to prove that the transformations between these languages preserve semantics. For example, the semantics of C#minor is typeless and features infinite loops only. The transformation from Clight to C#minor needs to not only fit the Clight program into this restricted semantics, but also ensure the semantic preservation of behavior.

The Coq formalization and proof amounts to 42,000 lines of Coq and approximately 3 person-years of work. 14% of these lines define compilation algorithms, 10% specify language semantics, and 76% correspond to the correctness proof.

Coq's extraction facility was used to generate an executable compiler. This process involves translating functional specifications in Coq into Caml code and then running the Caml code through the Caml compiler. Note that the correctness of CompCert relies on trust in this compilation chain since it has not been formally verified.

### Performance

CompCert's performance turns out to be adequate for critical embedded code. CompCert was benchmarked against the GCC 4.0.1 compiler at optimization levels 0, 1, and 2. CompCert generates code that is twice as fast as code generated with GCC without optimization. CompCert code is 7% slower than gcc-01 and 12% slower than gcc-02 on average. Compilation times of CompCert are within a factor of 2 of those of gcc-01. On the one hand, CompCert's compilation times are higher than GCC's, and CompCert code runs slower than GCC code on average. On the other hand, we think that in certain settings, such as compiling safety-critical software, this is a small price to pay for CompCert's correctness guarantees.
CompCert's performance turns out to be adequate for critical embedded code. CompCert was benchmarked against the GCC 4.0.1 compiler at optimization levels 0, 1, and 2. CompCert generates code that is twice as fast as code generated with GCC without optimization. On average, CompCert code is 7% slower than gcc-01 and 12% slower than gcc-02. Compilation times of CompCert are within a factor of 2 of those of gcc-01. On the one hand, CompCert's compilation times are higher than GCC's, and CompCert code runs slower than GCC code on average. On the other hand, we think that in certain settings, such as compiling safety-critical software, this is a small price to pay for CompCert's correctness guarantees.


<p align="center">
Expand Down

0 comments on commit ab5ddbf

Please sign in to comment.