A comprehensive Sys-Y language compiler with advanced debugging capabilities and extensive testing framework.
- Modular debugging options: Separate flags for token, AST, IR, and IR execution
- Dual comparison modes: Backend assembly testing vs IR-level correctness checking
- Performance profiling: Runtime measurement and analysis with detailed logging
- Colored terminal output: Clear visual feedback with success/failure indicators
- Comprehensive cleaning: Automated cleanup of all generated artifacts
- Granular debug controls: Fine-tuned debugging via
debug.hmacros - Multi-level IR execution: Brief and detailed execution tracing
- Variable analysis: Offset mapping and type analysis debugging
- Branch tracking: Detailed branch result monitoring
- TOML-based configuration: Structured test case organization
- Multiple test categories: Custom, functional, performance, and extended tests
- Automated test discovery: Recursive scanning and configuration generation
- Cross-platform support: QEMU-based RISC-V emulation with native fallback
This project is tested on Ubuntu 24.04. We recommend using this environment for optimal compatibility.
Python 3.10+ is required with the following packages:
sudo apt install python3-toml python3-coloramaOr using pip in a virtual environment:
pip install toml coloramaInstall clang for building the compiler and RISC-V GCC for cross-compilation:
sudo apt install clang gcc-13-riscv64-linux-gnuFor running RISC-V test cases:
sudo apt install qemu-user# Build the compiler using the provided task
make
# Or use VS Code task: "build compiler"Before running tests, build the RISC-V library:
make libOur enhanced test.py script provides comprehensive testing capabilities with multiple output levels and debugging options. Each parameter serves a specific purpose for different stages of compiler development and debugging.
- Generate Test Configuration:
python test.py --generate-configThis scans your test directory structure and creates test/config.toml with all discovered test cases organized by category.
- Build Required Libraries:
make libEach test case follows this standardized structure:
test_case_folder/
├── case_name.sy # Source file (SysY language)
├── case_name.in # Input data (optional)
└── case_name.out # Expected output
Generated Files During Testing:
test_case_folder/
├── case_name.tk # Tokenization output (--token)
├── case_name.json # AST output (--ast)
├── case_name.ir # IR representation (--ir)
├── case_name.ir_res # IR execution results (--ir_exec)
├── case_name.s # Assembly output (default)
├── case_name.res # Final execution results
├── case_name.time # Runtime measurements
└── diff.log # Differences when test fails
The debug system provides granular control over compiler internals through debug.h. Edit this file to enable specific debugging features before building.
- Edit debug.h to enable desired debugging features:
// Example debug.h configuration
#define DEBUG_EXEC_BRIEF 1 // Enable brief IR execution trace
#define DEBUG_EXEC_DETAIL 0 // Disable detailed execution info
#define DEBUG_HYM_SEE_VAR2OFFSET_TABLE 1 // Show variable offset mapping- Rebuild the compiler after changes:
make clean && make- Run tests with debugging enabled:
python test.py -c ./test/custom/hello_world --ir --ir_exec --verbose1. Run Single Test Case
python test.py -c ./test/custom/hello_world- Purpose: Test a specific case during development
- Output: Basic pass/fail result with runtime
- Use When: Debugging a specific algorithm or feature
2. Run All Configured Tests
python test.py -f ./test/config.toml- Purpose: Full regression testing
- Output: Summary of all test results
- Use When: Before committing changes or releases
3. Verbose Output (--verbose)
python test.py -c ./test/custom/array_test --verbose- Purpose: See detailed error messages and compilation output
- Shows: Compiler stderr/stdout, detailed failure reasons
- Use When: Investigating compilation failures or unexpected behavior
Example Output:
[DEBG] Running test: array_test
[DEBG] Compile stdout: IR generation completed
[DEBG] Compile stderr: Warning: unused variable 'temp'
[PASS] array_test 45.23 ms 12.34 ms
4. Token Analysis (--token)
python test.py -c ./test/custom/lexer_test --token --verbose- Purpose: Debug lexical analysis and tokenization
- Generates:
case_name.tkwith token stream - Use When: Fixing lexer bugs, adding new language features
Example .tk Output:
KEYWORD int
IDENTIFIER main
LPAREN (
RPAREN )
LBRACE {
KEYWORD return
NUMBER 0
SEMICOLON ;
RBRACE }
5. AST Generation (--ast)
python test.py -c ./test/custom/parser_test --ast --verbose- Purpose: Debug syntax analysis and AST construction
- Generates:
case_name.jsonwith structured AST - Use When: Fixing parser bugs, verifying syntax tree structure
Example .json Output:
{
"type": "CompUnit",
"functions": [{
"type": "FuncDef",
"name": "main",
"returnType": "int",
"body": {
"type": "Block",
"statements": [...]
}
}]
}6. IR Generation (--ir)
python test.py -c ./test/custom/semantic_test --ir --verbose - Purpose: Debug semantic analysis and IR generation
- Generates:
case_name.irwith intermediate representation - Use When: Fixing semantic analysis, optimizing IR generation
Example .ir Output:
int main()
0: call t_0, global()
1: return 3
end
void global()
0: return null
end
GVT:
// Global Variable Table if Exists
7. IR Execution (--ir_exec)
python test.py -c ./test/custom/execution_test --ir --ir_exec --verbose- Purpose: Test IR interpreter and execution correctness
- Generates:
case_name.ir_reswith execution results - Use When: Verifying IR semantics before backend compilation
8. IR-Level Testing (--ir_diff)
python test.py -c ./test/custom/ir_correctness --ir --ir_exec --ir_diff- Purpose: Compare IR execution results instead of assembly output
- Generates:
ir_diff.loginstead ofbackend_diff.log - Use When: Testing frontend correctness independent of backend
9. Clean Generated Files (--clean)
python test.py --clean- Purpose: Remove all generated test artifacts
- Removes:
.ir,.ir_res,.json,.tk,.s,.log,.res,.exe,.time - Use When: Starting fresh, disk space management
10. Configuration Generation (--generate-config)
python test.py --generate-config- Purpose: Auto-discover and configure all test cases
- Creates:
test/config.tomlwith organized test categories - Use When: Adding new test cases, reorganizing test structure
11. Custom Compiler Path
python test.py -c ./test/performance/matrix_mult --compiler ./my_optimized_compiler- Use When: Testing different compiler builds or versions
12. Custom Emulator
python test.py -c ./test/custom/riscv_specific --emulator qemu-riscv64-static- Use When: Using different QEMU versions or native execution
13. Combined Debugging
python test.py -c ./test/custom/complex_case --token --ast --ir --ir_exec --verbose- Purpose: Full pipeline debugging from tokens to execution
- Use When: Comprehensive analysis of compiler pipeline
# 1. Test lexer
python test.py -c ./test/custom/new_syntax --token --verbose
# 2. Test parser
python test.py -c ./test/custom/new_syntax --ast --verbose
# 3. Test semantic analysis
python test.py -c ./test/custom/new_syntax --ir --verbose
# 4. Test IR execution
python test.py -c ./test/custom/new_syntax --ir --ir_exec --ir_diff# 1. Run performance tests
python test.py -f ./test/config.toml | grep performance
# 2. Analyze runtime (check test/summary.log)
cat test/summary.log
# 3. Profile specific slow cases
python test.py -c ./test/performance/slow_case --verbose# 1. Clean and run failing test
python test.py --clean
python test.py -c ./test/failing_case --verbose
# 2. Check compilation artifacts
python test.py -c ./test/failing_case --ir --ir_exec --token --ast --verbose
# 3. Compare IR vs backend results
python test.py -c ./test/failing_case --ir_diffDownload formal test cases from our Latest Release and extract to the test directory:
test/
├── custom/ # Your custom test cases
├── formal/ # Official test cases (compiler2025, bisheng cup)
│ ├── performance/ # Performance benchmarks
│ └── basic/
│ ├── functional/ # Basic functionality tests
│ └── h_functional/ # Extended functionality tests
└── config.toml # Test configuration
Terminal Output Format:
[PASS] hello_world 12.45 ms 8.32 ms
[FAIL] matrix_mult 156.78 ms 89.12 ms
[PASS] fibonacci 5.23 ms 3.45 ms
- Status: PASS/FAIL indicator with color coding
- Test Name: Case name from folder
- Compilation Time: Time to compile the test case
- Execution Time: Time to run the generated code
Generated Log Files:
test/summary.log: All results sorted by runtime performancecase_folder/diff.log: Detailed differences when test failscase_folder/ir_diff.log: IR-level differences (with--ir_diff)
# Essential commands
python test.py --generate-config # Setup test configuration
python test.py --clean # Clean all generated files
python test.py -c ./test/custom/case_name # Test single case
python test.py -f ./test/config.toml # Test all cases
# Debugging specific stages
python test.py -c ./test/case --token # Debug lexer
python test.py -c ./test/case --ast # Debug parser
python test.py -c ./test/case --ir # Debug semantic analysis
python test.py -c ./test/case --ir_exec # Debug IR execution
# Advanced debugging
python test.py -c ./test/case --ir_diff # Compare IR results
python test.py -c ./test/case --verbose # Show detailed output- Mem2Reg (Memory to Register Promotion): Converts local variable memory accesses into SSA-form register operations, eliminating redundant memory accesses.
- Sparse Conditional Constant Propagation (SCCP): Propagates and folds constant values at compile time.
- InstCombine (Instruction Combination): Merges and simplifies adjacent instructions, such as constant folding and algebraic simplification.
- GVN (Global Value Numbering): Identifies and eliminates redundant computations by optimizing based on value equivalence.
- LICM (Loop Invariant Code Motion): Moves loop-invariant code outside of loops to reduce computations inside loops.
- Function Inlining: Inlines small function calls at the call site to reduce function call overhead.
- LCSSA (Loop-Closed SSA Form): Converts SSA form to loop-closed form to simplify loop analysis and optimization.
- Loop Unrolling: Duplicates the loop body multiple times to reduce loop control overhead and improve instruction-level parallelism.
- Dead Loop Elimination: Identifies and removes loops that will never execute.
- DCE (Dead Code Elimination): Removes unreachable code and unused instructions.
- CFG Simplification: Removes empty jump blocks and unreachable basic blocks to simplify the control flow graph.
- PHI Elimination: Converts SSA-form PHI nodes into conventional register assignments.
- SSA Destruction: Converts SSA form into traditional register assignment form.
- Register Allocation: Linear scan and graph coloring register assignment strategies
- Advanced Loop Optimizations: Loop vectorization, loop fusion, and loop interchange
- Block Ordering: Reorders basic blocks based on dominator tree order to improve cache locality.
- Advanced Register Allocation: Graph coloring with spill optimization
- Instruction Scheduling: Pipeline-aware instruction reordering
- Peephole Optimization: Local instruction sequence optimization
This project is part of the compiler design coursework and follows academic usage guidelines.
