From 9d0629b6e5ce21c9d4a810b6987d20047d68ddb5 Mon Sep 17 00:00:00 2001 From: Anshuman Mohan <10830208+anshumanmohan@users.noreply.github.com> Date: Thu, 11 Apr 2024 17:08:57 -0400 Subject: [PATCH] Minor grammar and other catches --- content/blog/2022-05-10-compcert/index.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/content/blog/2022-05-10-compcert/index.md b/content/blog/2022-05-10-compcert/index.md index d80af8db2..cb21b38dc 100644 --- a/content/blog/2022-05-10-compcert/index.md +++ b/content/blog/2022-05-10-compcert/index.md @@ -50,7 +50,7 @@ The author provides five ways to think about a compiler from a semantic perspect 4. $ S \vDash \text{Spec} \Rightarrow C \vDash \text{Spec} $ 5. $ S \mathtt{ safe} \Rightarrow C \mathtt{ safe} $ -These are not interchangeable, but the author explains how they relate to each other and points out features (such as transitivity) that they have in common. For a taste, property 2 says that, if none of $S$'s possible behaviors are "wrong", then any behavior observed during the execution of $C$ must have also been observed during the execution of $S$. Property 3 additionally assumes that the programs and their execution environments are deterministic: when we say $S \Downarrow B$, the behavior $B$ is uniquely determined. Intuitively, property 3 says that, for all behaviors that are not "wrong", if $S$ demonstrates that behavior, so will $C$. This is the property that CompCert actually establishes in its proofs. The choice of property 3 makes sense: +These are not interchangeable, but the author explains how they relate to each other and points out features, such as transitivity, that they have in common. For a taste, property 2 says that, if none of $S$'s possible behaviors are "wrong", then any behavior observed during the execution of $C$ must have also been observed during the execution of $S$. Property 3 additionally assumes that the programs and their execution environments are deterministic: when we say $S \Downarrow B$, the behavior $B$ is uniquely determined. Intuitively, property 3 says that, for all behaviors that are not "wrong", if $S$ demonstrates that behavior, so will $C$. This is the property that CompCert actually establishes in its proofs. The choice of property 3 makes sense: * CompCert does indeed work in a deterministic setting, so there is no need for it to deal with the messiness of nondeterministic behavior. * Establishing property 3 is easier than establishing some others: in a mechanized setting like Coq, one can `intro`duce (1) some specific behavior $B$, (2) the assumption $B \notin \mathtt{wrong}$, and (3) the assumption $S \Downarrow B$, and then perform structural induction on the execution of $S$. @@ -59,7 +59,7 @@ Additionally, we remarked in class that property 4 is a helpful way to appreciat #### Three Ways of Getting There Having established the overall goal of a trustworthy compiler, the the author surveys three ways to arrive at this trust. -In the discussion below, the compiler $Comp$ is a total function on source programs $S$. It either executes without going wrong, giving $\mathtt{OK}(C)$ where $C$ is the compiled program, or goes wrong and gives an $\mathtt{error}$. When we write $S \approx C$, we mean that any of the five notions of trust can be substituted in. +In the discussion below, the compiler $Comp$ is a total function on source programs $S$. It either executes without going wrong, returning $\mathtt{OK}(C)$ where $C$ is the compiled program, or goes wrong and gives an $\mathtt{error}$. When we write $S \approx C$, we mean that any of the five notions of trust can be substituted in. 1. Verify the compiler: @@ -91,12 +91,12 @@ That is: if CompCert compiles program $S$ without error into program $C$, and pr The CompCert compiler is composed of 14 passes that go through eight intermediate languages. The source language of CompCert is Clight, a large subset of the C programming language. Although Clight omits several C features such as extended-precision arithmetic and goto statements, it turns out to be sufficient for writing critical embedded software. -At the time when this paper was written, the target language of CompCert was the PowerPC assembly language. This translation from Clight to PowerPC was formally verified. However, some parts of the compiler were not verified yet including parsing, assembling, and linking. CompCert's intermediate languages are described below in more detail. +At the time when this paper was written, the target language of CompCert was the PowerPC assembly language. This translation from Clight to PowerPC was formally verified. However, some parts of the compiler, including parsing, assembling, and linking, were not verified. CompCert's intermediate languages are described in more detail below. * **C#minor** is a typeless variant of Clight. To go from Clight to C#minor, normal arithmetic operators are replaced by distinct arithmetic operators for integers, pointers, and floats. C loops are replaced by infinite loops (blocks and multi-level exits from enclosing blocks are added). * **Cminor** is very similar to C#minor but omits the & operator, meaning addresses cannot be taken. * **CminorSel** is the target language of the instruction selection pass which involves a bottom-up rewriting of Cminor expressions. -* **RTL** is a register transfer language which is an IR that is very close to assembly and represents data-flow at the register-transfer level. Control is represented as a control-flow graph (CFG), and instructions operate over pseudo-registers. Optimizations based on dataflow analyses can be implemented at this stage. Some optimizations that have been implemented and integrated include constant propagation and common subexpression elimination. +* **RTL** is a register transfer language: an IR that is very close to assembly and represents data-flow at the register-transfer level. Control is represented as a control-flow graph (CFG), and instructions operate over pseudo-registers. Optimizations based on dataflow analyses can be implemented at this stage. Some optimizations that have been implemented and integrated include constant propagation and common subexpression elimination. * **LTL** is similar to RTL except pseudo-registers are replaced by hardware registers or abstract stack locations after register allocation is performed. * **LTLin** is the result of CFG linearization which produces a list of instructions with labels and branches. * **Linear** is the next IR after LTLin. Spill and reload instructions are placed around instructions that reference pseudo-registers allocated to stack locations. Move instructions are inserted to enforce calling conventions. @@ -105,7 +105,7 @@ At the time when this paper was written, the target language of CompCert was the ### Proof of Correctness -The source, intermediate, and target languages of CompCert all have formally defined semantics. The Coq proof assistant was used to prove that the transformations between these languages preserve semantics. For example, the semantics of C#minor is typeless and features infinite loops only. The transformation from Clight to C#minor needs to not only fit the Clight program into these restricted semantics, but also ensure the semantics preservation of behavior. +The source, intermediate, and target languages of CompCert all have formally defined semantics. The Coq proof assistant was used to prove that the transformations between these languages preserve semantics. For example, the semantics of C#minor is typeless and features infinite loops only. The transformation from Clight to C#minor needs to not only fit the Clight program into this restricted semantics, but also ensure the semantic preservation of behavior. The Coq formalization and proof amounts to 42,000 lines of Coq and approximately 3 person-years of work. 14% of these lines define compilation algorithms, 10% specify language semantics, and 76% correspond to the correctness proof. @@ -113,7 +113,7 @@ Coq's extraction facility was used to generate an executable compiler. This proc ### Performance -CompCert's performance turns out to be adequate for critical embedded code. CompCert was benchmarked against the GCC 4.0.1 compiler at optimization levels 0, 1, and 2. CompCert generates code that is twice as fast as code generated with GCC without optimization. CompCert code is 7% slower than gcc-01 and 12% slower than gcc-02 on average. Compilation times of CompCert are within a factor of 2 of those of gcc-01. On the one hand, CompCert's compilation times are higher than GCC's, and CompCert code runs slower than GCC code on average. On the other hand, we think that in certain settings, such as compiling safety-critical software, this is a small price to pay for CompCert's correctness guarantees. +CompCert's performance turns out to be adequate for critical embedded code. CompCert was benchmarked against the GCC 4.0.1 compiler at optimization levels 0, 1, and 2. CompCert generates code that is twice as fast as code generated with GCC without optimization. On average, CompCert code is 7% slower than gcc-01 and 12% slower than gcc-02. Compilation times of CompCert are within a factor of 2 of those of gcc-01. On the one hand, CompCert's compilation times are higher than GCC's, and CompCert code runs slower than GCC code on average. On the other hand, we think that in certain settings, such as compiling safety-critical software, this is a small price to pay for CompCert's correctness guarantees.