- Introduction
- Features
- Project Structure
- Installation & Setup
- Language Specification
- Quad Language Specification
- Compiler Details -Compilation Process
- Code Examples
- Error Handling
- Optimization Techniques
- Testing
A custom compiler built with Flex & Bison (C++), generating Three-Address Code (TAC) for intermediate representation in a custom language (Quad).
- Lexer & Parser: Built using Flex and Bison.
- AST Representation: A structured Abstract Syntax Tree (AST) for syntax analysis.
- TAC Generation: Generates three-address code (Quad Code) using
QuadGenerator(private implementation). - Error Handling: Reports syntax and semantic errors with structured messages.
- Optimizations: Implements temporary variable reuse for efficiency.
MyCompiler/
│── src/
│ ├── cpq.cpp # Compiler entry point
│ ├── lexer.l # Flex lexer definition
│ ├── parser.y # Bison grammar definition
│ ├── global_scope.cpp # Global scope implementations
│ ├── symbol_table.cpp # Symbol table implementations
│ ├── AST/
│ │ ├── Base/ # Base AST node class
│ │ │ ├── ASTNode.cpp # Main node-type implementation.
│ │ │ ├── QuadGenerator.cpp # TAC Generator
│ │ │ ├── SemanticChecker.cpp # Semantic Analysis
│ │ ├── ControlFlow/ # ControlFlow nodes
│ │ │ ├── ASTBreakNode.cpp # AST Node for Break statements
│ │ │ ├── ASTCaseListNode.cpp # AST Node for Case statements
│ │ │ ├── ASTIfNode.cpp # AST Node for If-Else statements
│ │ │ ├── ASTSwitchNode.cpp # AST Node for Switch statements
│ │ │ ├── ASTWhileNode.cpp # AST Node for While statements
│ │ ├── Expressions/ # Expression nodes
│ │ │ ├── ASTBinaryExprNode.cpp # Binary expression nodes (2 or more operands)
│ │ │ ├── ASTCastExprNode.cpp # Casting nodes
│ │ │ ├── ASTIdentifier.cpp # ID nodes
│ │ │ ├── ASTLiteralNode.cpp # Literal (Immediate) nodes
│ │ │ ├── ASTUnaryExprNode.cpp # Unary expression nodes
│ │ ├── Statements/ # Statement nodes
│ │ │ ├── ASTAssignNode.cpp # Assignment nodes
│ │ │ ├── ASTBlockNode.cpp # Code-Block nodes
│ │ │ ├── ASTInputNode.cpp # Input nodes
│ │ │ ├── ASTOutputNode.cpp # Output nodes
│ │ │ ├── ASTProgramRoot.cpp # Program Root node
│ │ │ ├── ASTStatementListNode.cpp # Statement list node (Inside a code block)
│── include/ # Header files
│── tests/ # Sample test cases
│── docs/ # Documentation
│── Makefile # Build automation
│── README.md # Project documentation
- C++17 or later
- GNU Make
- Flex & Bison Installable from Here Or via MSYS2 (Windows)
pacman -S mingw-w64-x86_64-flex mingw-w64-x86_64-bisonmake./cpq input_fileYou can run your .qud file (TAC Result code) with the interperter.
python interperter/quadinter.py [file path].qud
break case default else float if input int output switch while
( ) { } , : ; =
RELOP: == | != | < | > | >= | <=
ADDOP: + | -
MULOP: * | /
OR: ||
AND: &&
NOT: !
CAST: cast<int> cast<float>
ID: letter (letter|digit)*
NUM: digit+ | digit+.digit*
digit:0 | 1 | ... | 9letter:a | b | ... | z | A | B | ... | Z
- Whitespace characters such as spaces (
), tabs (\t), and newlines (\n) may appear between tokens. They are mandatory where needed for separation (e.g., between a keyword and an identifier) but optional elsewhere. - Comments are written between
/* ... */and do not support nesting. - The language is case-sensitive.
- Integer constants (without a decimal point) are of type
int. - Floating-point constants (with a decimal point) are of type
float.
- If at least one operand is
float, the result isfloat. - If both operands are
int, the result isint. - When an operation involves mixed types (
intandfloat), theintoperand is converted tofloatbefore execution.
- Variables must be declared before use.
- Integer division results in the integer quotient (like C).
- Assignments are valid if both sides have the same type or the left-hand side is
floatand the right-hand side isint(implicit conversion tofloat).
- The
breakstatement is only valid inside awhileloop orswitchcase (same as C). - If a
breakis omitted in aswitch-case, execution continues to the nextcase. - The expression following
switchmust be anint, and allcaselabels must be integers.
cast<int>converts a value toint.cast<float>converts a value tofloat.
/* Finding minimum between two numbers */
a,b : float;
{
input(a);
input(b);
if (a < b)
output(a)
else
output(b)
}A- Integer variableB, C- Integer variables or constantsD- Real (floating-point) variableE, F- Real variables or constantsL- Jump target (line number)
Note: The actual variable names in the program should be written in lowercase, and may include letters, digits, and underscores (
_).
| Opcode | Arguments | Description |
|---|---|---|
| IASN | A B |
A := B (Assign B to A) |
| IPRT | B |
Print the value of B |
| IINP | A |
Read an integer into A |
| IEQL | A B C |
If B = C then A := 1 else A := 0 |
| INQL | A B C |
If B ≠ C then A := 1 else A := 0 |
| ILSS | A B C |
If B < C then A := 1 else A := 0 |
| IGRT | A B C |
If B > C then A := 1 else A := 0 |
| IADD | A B C |
A := B + C |
| ISUB | A B C |
A := B - C |
| IMLT | A B C |
A := B * C |
| IDIV | A B C |
A := B / C |
| RASN | D E |
D := E (Assign E to D) |
| RPRT | E |
Print the value of E |
| RINP | D |
Read a real (floating-point) number into D |
| REQL | A E F |
If E = F then A := 1 else A := 0 |
| RNQL | A E F |
If E ≠ F then A := 1 else A := 0 |
| RLSS | A E F |
If E < F then A := 1 else A := 0 |
| RGRT | A E F |
If E > F then A := 1 else A := 0 |
| RADD | D E F |
D := E + F |
| RSUB | D E F |
D := E - F |
| RMLT | D E F |
D := E * F |
| RDIV | D E F |
D := E / F |
| ITOR | D B |
Convert B (integer) to D (real) |
| RTOI | A E |
Convert E (real) to A (integer) |
| JUMP | L |
Jump to instruction L |
| JMPZ | L A |
If A = 0, jump to instruction L, else continue execution |
| HALT | - | Stop execution immediately |
This Quad Instruction Set defines all available intermediate representation (TAC) operations used in the compiler.
Lexing (also called scanning) is the first stage of compilation, where the source code is transformed into tokens. The Flex lexer reads the input and classifies words based on the defined lexical rules.
if (x > 5) output(x);IF ( ID > NUM ) OUTPUT ( ID ) ;
Parsing is the process of analyzing the token sequence to ensure that it follows the grammar rules of the language. This phase constructs a parse tree or an AST (Abstract Syntax Tree).
IfStmt
├── Condition: >
│ ├── ID: x
│ └── NUM: 5
└── Body:
└── OutputStmt
└── ID: x
The Abstract Syntax Tree (AST) is a structured, tree-like representation of the program after parsing. It simplifies later stages of compilation.
Assign
├── ID: y
└── BinaryExpr: +
├── ID: x
└── BinaryExpr: *
├── NUM: 3
└── NUM: 2
Three-Address Code (TAC) is an intermediate representation of the source code that simplifies later translation to machine code. TAC instructions use a compact format, often relying on temporary variables.
IASN t0 3
IASN t1 2
IMUL t3 t0 t1
IADD t4 x t3
IASN y t4Code optimizations help improve efficiency by reducing redundancy, minimizing memory usage, and enhancing execution speed.
x : int;
{
x = 5;
if (x > 2){
output(x)
}
else {}
}ID INT;
ID = NUM;
IF ( ID > NUM ) {
OUTPUT ( ID );
}
else {}
IfStmt
├── Condition: >
│ ├── ID: x
│ └── NUM: 2
└── Body:
└── OutputStmt
└── ID: x
IASN t0 5
IASN x t0
IASN t0 2
IGRT t1 x t0
JMPZ 8 t1
IPRT x
JUMP 8
HALT 5
The following error codes are defined in the compiler:
enum class ErrorCode {
ERR_NO_FILE_FOUND,
ERR_FILE_CANT_BE_OPENED,
ERR_FILE_CANT_BE_READ,
ERR_NESTED_COMMENTS,
ERR_UNTERMINATED_COMMENT,
ERR_TOKEN_INVALID,
ERR_NO_SUCH_VARIABLE,
ERR_MISSING_CONDITION,
ERR_MISSING_MAIN_CASE,
ERR_CASE_ISNT_INT,
ERR_MISSING_CASE_NUMBER,
ERR_SWITCH_NO_DEFAULT,
ERR_OP_REQUIRES_TWO,
ERR_COMPARING_FI,
ERR_MISSING_EXPRESSION,
ERR_BREAK_OUTSIDE_WHILE,
ERR_PARSER,
ERR_MISSING_INPUT,
};In addition to errors, the compiler issues warnings for operations involving mixed int and float types.
int x;
x = 3.5 + 2;Warning Output:
Warning: Performing action between int and float (Will be assigned FLOAT automatically).
Errors that occur due to incorrect structure in the source code.
if x > 5) output(x);Error Output:
Syntax Error: Unexpected token 'x'. Expected '('.
Replaces constant expressions with precomputed values.
y = 3 + 2;Optimized TAC:
IASN y, 5Removes unused or unreachable code.
x = 5;
x = 6;Optimized TAC:
IASN x, 6Minimizes the number of memory accesses by keeping variables in registers.
Predefined tests can be found in dir/tests folder of the project. In order to run one test you can use the command :
cpq [file name]In order to run all tests in the folder, you can use the command :
make tests- Code Optimizations: Constant folding, dead code elimination, and register allocation
- Backend Integration: Convert TAC to machine code
- More Language Features: Support for arrays and functions