Skip to content

Commit a77ef6d

Browse files
committed
Polish English Tutorial
Polish the English text of the following tutorial chapters: * `tutorial/en/0-Preface.md` * `tutorial/en/1-Skeleton.md` * `tutorial/en/2-Virtual-Machine.md` Improve markdown docs: * Add Level 1 title to docs ("Preface", "1. Skeleton", etc.). * Replaces smiles with markdown emojis. * Fix titles casing using Chicago Manual of Style capitalization rules: https://capitalizemytitle.com/style/chicago# * Convert inline-link to reference-style links (DRY!). * Add a few links to external references.
1 parent f9f633e commit a77ef6d

File tree

3 files changed

+325
-264
lines changed

3 files changed

+325
-264
lines changed

tutorial/en/0-Preface.md

Lines changed: 94 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,104 +1,126 @@
1-
This series of articles is a tutorial for building a C compiler from scratch.
1+
# Preface
22

3-
I lied a little in the above sentence: it is actually an _interpreter_ instead
4-
of _compiler_. I lied because what the hell is a "C interpreter"? You will
5-
however, understand compilers better by building an interpreter.
3+
This is multi-part tutorial on how to build a C compiler from scratch.
64

7-
Yeah, I wish you can get a basic understanding of how a compiler is
8-
constructed, and realize it is not that hard to build one. Good Luck!
5+
Well, I lied a little in the previous sentence: it's actually an _interpreter_,
6+
not a _compiler_. I had to lie, because what on earth is a "C interpreter"?
7+
You will however gain a better understanding of compilers by building an
8+
interpreter.
99

10-
Finally, this series is written in Chinese in the first place, feel free to
11-
correct me if you are confused by my English. And I would like it very much if
12-
you could teach me some "native" English :)
10+
Yeah, I want to provide you with a basic understanding of how a compiler is
11+
constructed, and realize that it's not that hard to build one, after all.
12+
Good Luck!
1313

14-
We won't write any code in this chapter, feel free to skip it if you are
15-
desperate to see some code...
14+
This tutorial was originally written in Chinese, so feel free to correct me if
15+
you're confused by my English. Also, I would really appreciate it if you could
16+
teach me some "native" English. :smile:
1617

17-
## Why you should care about compiler theory?
18+
We won't be writing any code in this chapter; so if you're eager to see some code, feel free to skip it.
1819

19-
Because it is **COOL**!
2020

21-
And it is very useful. Programs are built to do something for us, when they
22-
are used to translate some forms of data into another form, we can call them
23-
a compiler. Thus by learning some compiler theory we are trying to master a very
24-
powerful technique of solving problems. Isn't that cool enough to you?
21+
## Why Should I Care about Compiler Theory?
22+
23+
Because it's **COOL**!
24+
25+
And it's also very useful. Programs are designed to do something for us; when
26+
they are used to translate some form of data into another form, we can call
27+
them compilers. Thus, by learning some compiler theory, we are trying to
28+
master a very powerful problem solving technique. Doesn't this sound cool
29+
enough to you?
30+
31+
People used to say that understanding how a compiler works would help you to
32+
write better code. Some would argue that modern compilers are so good at
33+
optimizing that you shouldn't care any more. Well, that's true, most people
34+
don't need to learn compiler theory to improve code performance — and by "most
35+
people" I mean _you_!
2536

26-
People used to say understanding how a compiler works would help you to write
27-
better code. Some would argue that modern compilers are so good at
28-
optimization that you should not care any more. Well, that's true, most people
29-
don't need to learn compiler theory only to improve the efficency of the code.
30-
And by most people, I mean you!
3137

3238
## We Don't Like Theory Either
3339

34-
I have always been in awe of compiler theory because that's what makes
35-
programing easy. Anyway can you imaging building a web browser in only
36-
assembly language? So when I got a chance to learn compiler theory in college,
37-
I was so excited! And then... I quit, not understanding what that it.
40+
I've always been in awe of compiler theory because that's what makes programing
41+
easy. Anyway, can you imagine building a web browser entirely in assembly
42+
language? So when I got a chance to learn compiler theory in college, I was so
43+
excited! And then ... I quit! And left without understanding what it's all
44+
about.
3845

39-
Normally a course of compiler will cover:
46+
Normally compiler course covers the following topics:
4047

41-
1. How to represent syntax (such as BNF, etc.)
42-
2. Lexer, with somewhat NFA(Nondeterministic Finite Automata),
43-
DFA(Deterministic Finite Automata).
44-
3. Parser, such as recursive descent, LL(k), LALR, etc.
48+
1. How to represent syntaxes (i.e. BNF, etc.)
49+
2. Lexers, using NFA (Nondeterministic Finite Automata) and
50+
DFA (Deterministic Finite Automata).
51+
3. Parsers, such as recursive descent, LL(k), LALR, etc.
4552
4. Intermediate Languages.
4653
5. Code generation.
4754
6. Code optimization.
4855

49-
Perhaps more than 90% students will not care anything beyond the parser, and
50-
what's more, we still don't know how to build a compiler! Even after all the
51-
effort learning the theories. Well the main reason is that what "Compiler
52-
Thoery" trys to teach is "How to build a parser generator", namely a tool that
53-
consumes syntax gramer and generates a compiler for you. lex/yacc or
54-
flex/bison or things like that.
56+
Perhaps more than 90% of the students won't really care about any of that,
57+
except for the parser, and what's more, we'd still won't know how to actually
58+
build a compiler! even after all the effort of learning the theory. Well, the
59+
main reason is that what "Compiler Theory" tries to teach is "how to build a
60+
parser generator" — i.e. a tool that consumes a syntax grammar and generates a
61+
compiler for you, like lex/yacc or flex/bison, or similar tools.
62+
63+
These theories try to teach us how to solve the general challenges of
64+
generating compilers automatically. Once you've mastered them, you're able to
65+
deal with all kinds of grammars. They are indeed useful in the industry.
66+
Nevertheless, they are too powerful and too complicated for students and most
67+
programmers. If you try to read lex/yacc's source code you'll understand what
68+
I mean.
5569

56-
These theories try to teach us how to solve the general problems of generating
57-
compilers automatically. That means once you've mastered them, you are able to
58-
deal with all kinds of grammars. They are indeed useful in industry.
59-
Nevertheless they are too powerful and too complicated for students and most
60-
programmers. You will understand that if you try to read lex/yacc's source
61-
code.
70+
The good news is that building a compiler can be much simpler than you ever
71+
imagined. I won't lie, it's not easy, but definitely not hard.
6272

63-
Good news is building a compiler can be much simpler than you ever imagined.
64-
I won't lie, not easy, but definitely not hard.
6573

66-
## Birth of this project
74+
## How This Project Began
6775

68-
One day I came across the project [c4](https://github.com/rswier/c4) on
69-
Github. It is a small C interpreter which is claimed to be implemented by only
70-
4 functions. The most amazing part is that it is bootstrapping (that interpret
71-
itself). Also it is done with about 500 lines!
76+
One day I came across the project [c4] on Github, a small C interpreter
77+
claiming to be implemented with only 4 functions. The most amazing part is
78+
that it's [bootstrapping] (i.e. it can interpret itself). Furthermore, it's
79+
being done in around 500 lines of code!
7280

73-
Meanwhile I've read a lot of tutorials about compiler, they are either too
74-
simple(such as implementing a simple calculator) or using automation
75-
tools(such as flex/bison). c4 is however implemented all from scratch. The
76-
sad thing is that it try to be minimal, that makes the code quite a mess, hard
77-
to understand. So I started a new project to:
81+
Meanwhile, I've read many tutorials on compilers design, and found them to be
82+
either too simple (such as implementing a simple calculator) or using
83+
automation tools (such as flex/bison). [C4], however, is implemented entirely
84+
from scratch. The sad thing is that it aims to be "an exercise in minimalism,"
85+
which makes the code quite messy and hard to understand. So I started a new
86+
project, in order to:
7887

79-
1. Implement a working C compiler(interpreter actually)
80-
2. Write a tutorial of how it is built.
88+
1. Implement a working C compiler (an interpreter, actually).
89+
2. Write a step-by-step tutorial on how it was built.
8190

82-
It took me 1 week to re-write it, resulting 1400 lines including comments. The
83-
project is hosted on Github: [Write a C Interpreter](https://github.com/lotabout/write-a-C-interpreter).
91+
It took me one week to re-write it, resulting in 1400 lines of code (including
92+
comments). The project is hosted on Github: [Write a C Interpreter].
8493

85-
Thanks rswier for bringing us a wonderful project!
94+
Thanks [@rswier] for sharing with us [c4], it's such a wonderful project!
8695

87-
## Before you go
8896

89-
Implementing a compiler could be boring and it is hard to debug. So I hope you
90-
can spare enough time studying, as well as type the code. I am sure that you
91-
will feel a great sense of accomplishment just like I do.
97+
## Before You Begin
98+
99+
Implementing a compiler can be boring and hard to debug. So I hope you can
100+
spare enough time studying, and typing code. I'm sure that you will feel a
101+
great sense of accomplishment, just like I do.
102+
92103

93104
## Good Resources
94105

95-
1. [Let’s Build a Compiler](http://compilers.iecc.com/crenshaw/): a very good
96-
tutorial of building a compiler for fresh starters.
97-
2. [Lemon Parser Generator](http://www.hwaci.com/sw/lemon/): the parser
98-
generator that is used in SQLite. Good to read if you want to understand
99-
compiler theory with code.
106+
1. _[Let’s Build a Compiler]_: a very good tutorial of building a compiler,
107+
written for beginners.
108+
2. [Lemon Parser Generator]: the parser generator used by SQLite.
109+
Good to read if you want to understand compiler theory with code.
110+
111+
In the end, I am just a person with a general level of expertise, so there
112+
will inevitably be some mistakes in my articles and code (and also in my
113+
English). Feel free to correct me!
114+
115+
I hope you'll enjoy it.
100116

101-
In the end, I am human with a general level, there will be inevitably wrong
102-
with the articles and codes(also my English). Feel free to correct me!
117+
<!-----------------------------------------------------------------------------
118+
REFERENCE LINKS
119+
------------------------------------------------------------------------------>
103120

104-
Hope you enjoy it.
121+
[@rswier]: https://github.com/rswier "Visit @rswier's GitHub profile"
122+
[bootstrapping]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) "Wikipedia » Bootstrapping (compilers)"
123+
[c4]: https://github.com/rswier/c4 "Visit the c4 repository on GitHub"
124+
[Lemon Parser Generator]: http://www.hwaci.com/sw/lemon/ "Visit Lemon homepage"
125+
[Let’s Build a Compiler]: http://compilers.iecc.com/crenshaw/ "15-part tutorial series, by Jack Crenshaw"
126+
[Write a C Interpreter]: https://github.com/lotabout/write-a-C-interpreter "Visit the 'Write a C Interpreter' repository on GitHub"

tutorial/en/1-Skeleton.md

Lines changed: 72 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,69 @@
1-
In this chapter we will have an overview of the compiler's structure.
1+
# 1. Skeleton
22

3-
Before we start, I'd like to restress that it is **interperter** that we want
4-
to build. That means we can run a C source file just like a script. It is
5-
chosen mainly for two reasons:
3+
In this chapter we'll present an overview of the compiler's structure.
64

7-
1. Interpreter differs from Compiler only in code generation phase, thus we'll
8-
still learn all the core techniques of building a compiler(such as lexical
9-
analyzing and parsing).
10-
2. We will build our own virtual machine and assembly instructions, that would
11-
help us to understand how computers work.
5+
Before we start, let me stress again that will be building an **interperter**.
6+
This means we'll be able to run a C source file as if it was a script. The main
7+
reasons behind this choice are twofold:
128

13-
## Three Phases
9+
1. An interpreter differs from a compiler only in the code generation phase,
10+
thus we'll still learn all the core techniques of building a compiler
11+
(such as lexical analyzing and parsing).
12+
2. We will build our own virtual machine and [assembly instruction set];
13+
this will help us understand how computers work.
1414

15-
Given a source file, normally the compiler will cast three phases of
16-
processing:
1715

18-
1. Lexical Analysis: converts source strings into internal token stream.
19-
2. Parsing: consumes token stream and constructs syntax tree.
20-
3. Code Generation: walk through the syntax tree and generate code for target
21-
platform.
16+
## The Three Phases of Compiling
2217

23-
Compiler Construction had been so mature that part 1 & 2 can be done by
24-
automation tools. For example, flex can be used for lexical analysis, bison
25-
for parsing. They are powerful but do thousands of things behind the scene. In
26-
order to fully understand how to build a compiler, we are going to build them
27-
all from scratch.
18+
Given a source file, the compiler usually carries out three processing phases:
2819

29-
Thus we will build our interpreter in the following steps:
20+
1. **Lexical Analysis**:
21+
converts source strings into an internal stream of tokens.
22+
2. **Parsing**: consumes the tokens stream and constructs a syntax tree.
23+
3. **Code Generation**:
24+
walks through the syntax tree and generates code for target platform.
3025

31-
1. Build our own virtual machine and instruction set. This is the target
32-
platform that will be using in our code generation phase.
33-
2. Build our own lexer for C compiler.
34-
3. Write a recusion descent parser on our own.
26+
Compiler Construction is so mature that phases one and two can be done by
27+
automation tools. For example, flex can be used for lexical analysis, bison for
28+
parsing. These are powerful tools, which do thousands of things behind the
29+
scene. In order to fully understand how to build a compiler, we're going to
30+
handcraft all three phases, from scratch.
3531

36-
## Skeleton of our compiler
32+
Therefore, we'll build our interpreter in the following steps:
3733

34+
1. Build our own virtual machine and instruction set.
35+
This will be our target platform in the code generation phase.
36+
2. Build our own lexer for C compilers.
37+
3. Write a [recursive descent parser] on our own.
3838

39-
Modeling after c4, our compiler includes 4 main functions:
4039

41-
1. `next()` for lexical analysis; get the next token; will ignore spaces tabs
42-
etc.
43-
2. `program()` main entrance for parser.
44-
3. `expression(level)`: parser expression; level will be explained in later
45-
chapter.
46-
4. `eval()`: the entrance for virtual machine; used to interpret target
47-
instructions.
40+
## The Skeleton of Our Compiler
4841

49-
Why would `expression` exist when we have `program` for parser? That's because
50-
the parser for expressions is relatively independent and complex, so we put it
51-
into a single module(function).
42+
Modeled after [c4], our compiler includes four main functions:
5243

53-
The code is as following:
44+
1. `next()`
45+
for lexical analysis; fetches the next token; ignores spaces, tabs, etc.
46+
2. `program()` — parser main entry point.
47+
3. `expression(level)`
48+
expressions parser; it will be explained in a later chapter.
49+
4. `eval()`
50+
virtual machine entry point; used to interpret target instructions.
51+
52+
Why do we need `expression()` when we already have `program()` for the parser?
53+
That's because the expressions parser is relatively independent and complex,
54+
so we put it into a single module (function).
55+
56+
The code is as follows:
5457

5558
```c
5659
#include <stdio.h>
5760
#include <stdlib.h>
5861
#include <memory.h>
5962
#include <string.h>
60-
#define int long long // work with 64bit target
63+
#define int long long // work with 64-bit target
6164

6265
int token; // current token
63-
char *src, *old_src; // pointer to source code string;
66+
char *src, *old_src; // pointer to source code string
6467
int poolsize; // default size of text/data/stack
6568
int line; // line number
6669

@@ -119,34 +122,46 @@ int main(int argc, char **argv)
119122
}
120123
```
121124
122-
That's quite some code for the first chapter of the article. Nevertheless it
123-
is actually simple enough. The code tries to reads in a source file, character
124-
by character and print them out.
125+
That's quite some code for the first chapter of the tutorial. Nevertheless it's
126+
actually quite simple. The code tries to reads a source file, character by
127+
character, and print them out.
125128
126-
Currently the lexer `next()` does nothing but returning the characters as they
127-
are in the source file. The parser `program()` doesn't take care of its job
128-
either, no syntax trees are generated, no target codes are generated.
129+
Currently, the lexer function `next()` does nothing except returning the
130+
characters as they are encountered in the source file. The parser's `program()`
131+
doesn't take care of its job either — it doesn't generate any syntax trees, nor
132+
target code.
129133
130134
The important thing here is to understand the meaning of these functions and
131-
how they are hooked together as they are the skeleton of our interpreter.
132-
We'll fill them out step by step in later chapters.
135+
how they are hooked together, since they constitute the skeleton of our
136+
interpreter. We'll fill them out step by step, in the upcoming chapters.
133137
134-
## Code
138+
139+
## Source Code
135140
136141
The code for this chapter can be downloaded from
137-
[Github](https://github.com/lotabout/write-a-C-interpreter/tree/step-0), or
138-
clone by:
142+
[GitHub](https://github.com/lotabout/write-a-C-interpreter/tree/step-0),
143+
or cloned via:
139144
140145
```
141146
git clone -b step-0 https://github.com/lotabout/write-a-C-interpreter
142147
```
143148
144-
Note that I might fix bugs later, and if there is any incosistance between the
145-
artical and the code branches, follow the article. I would only update code in
146-
the master branch.
149+
> **NOTE** — I might fix bugs later; if you notice any inconsistencies between
150+
the tutorial and the code branches, follow the tutorial. I will only update
151+
code in the master branch.
152+
147153
148154
## Summary
149155
150-
After some boring typing, we have the simplest compiler: a do-nothing
151-
compiler. In next chapter, we will implement the `eval` function, i.e. our own
156+
After some boring typing, we now have the simplest compiler: a do-nothing
157+
compiler. In next chapter, we'll implement the `eval` function, i.e. our own
152158
virtual machine. See you then.
159+
160+
161+
<!-----------------------------------------------------------------------------
162+
REFERENCE LINKS
163+
------------------------------------------------------------------------------>
164+
165+
[assembly instruction set]: https://en.wikipedia.org/wiki/Instruction_set_architecture "Wikipedia » Instruction set architecture"
166+
[c4]: https://github.com/rswier/c4 "Visit the c4 repository on GitHub"
167+
[recursive descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser "Wikipedia » Recursive descent parser"

0 commit comments

Comments
 (0)