A custom 5-stage pipelined CPU implemented on FPGA and based on the VeSPA ISA, designed for educational exploration of computer architecture and digital design.
- VeSPA - System on Chip
➡️ This project was developed by a team of 20 students as part of the Computer Architecture course.
➡️ The class was organized into three groups, which rotated roles throughout the semester to cover different stages of the development process
➡️ The roles were:
- CPU Pipeline (SoC Team) - Current Repository
- VESPA C Compiler Frontend
- VESPA C Compiler Backend
The goal of this project was to enhance a pre-existing version of the VeSPA CPU.
This version of the VeSPA processor was originally developed in a previous semester and is documented in the report VeSPA Single-Cycle CPU.
This project builds upon this previously developed single-cycle version of the VeSPA processor, presented below.
- Implementation of a 5-stage pipelined datapath (IF, ID, EX, MEM, WB) — Main Goal
- Mitigate structural hazard by Adopting a Harvard architecture, replacing the previous Von Neumann design
- Introduction of pipeline registers between stages to enable instruction-level parallelism
- Extension of the control unit to generate and manage new pipeline control signals
- Integration of hazard detection units to identify data and control dependencies
- Implementation of data hazard mitigation mechanisms, including data forwarding and pipeline stalls
- Addition of branch prediction mechanisms to minimize control hazards
- Implementation of pipeline flush procedures to handle branch mispredictions
- Addition of shift instructions supported by a barrel shifter
- Fix, Test, and integrate previosly developed peripherals with the CPU, including:
- Interrupt Controller
- GPIO
- PS/2 Keyboard Controller
- Timer
- UART
- Implement and integrate a VGA Controller peripheral
- Verilog
- Xilinx Vivado
- Zybo Board Z7-10
To achieve full integration of the peripherals, a data bus was implemented to connect the CPU (Master) with all peripheral modules (Slaves).
The following diagrams illustrate the bus interconnection architecture and the corresponding peripheral memory mapping within the system’s address space.
The diagram below presents the full pipelined datapath implemented in this project.
For better visibility and detail, it is also available in:
- PNG image → View Datapath Image
- Final report → View final report
- Presentation slides → View final presentation
The Application Binary Interface (ABI) defines how the code generated by the compiler interacts with the VeSPA hardware.
It acts as an agreement between the CPU architecture and the compiler, ensuring that both interpret function calls, register usage, data formats and memory structures in a consistent manner.
This ABI specifies:
- Memory organization
- Register usage conventions
- Function calling conventions and stack frame layout
- Data representation (endianness, data sizes and alignment)
- Interrupt handling rules
Full ABI specification here.
| Register | Purpose | Caller-Saved |
|---|---|---|
| R0 | Constant zero (hardwired) | – |
| R1 | Return address for function calls | Yes |
| R2 | Frame Pointer (FP) | Yes |
| R3 | Stack Pointer (SP) | Yes |
| R4–R5 | Return registers and special registers used for Mul/Div operations | No |
| R6–R31 | General-purpose temporary registers | No |
To test or deploy this project on hardware or simulation, you will need the development tools and hardware referenced (or similar).
You will first need to generate a .coe file (memory initialization file), which can be done in one of the following ways.
After generating the .coe file you will need to input it manually in the VeSPA code memory inside Vivado Tool
You can compile C or Assembly code into VeSPA machine code using the following repositories:
-
C Compiler (C to VeSPA Assembly and Binary/COE): RocketC – VeSPA C Compiler
-
Assembler / Backend (Assembly to Binary/COE): RocketC – VeSPA C Compiler
The Python script is located here Python COE Script
- Write the VeSPA Assembly program inside the
code.txtfile - Run the script using:
python3 VeSPA_binary.py
Because deploying software directly to the Zybo Z7 FPGA can be time-consuming, and debugging capabilities on hardware are limited, a virtual runtime environment was created.
This virtual environment allows developers to run and debug VeSPA programs, without requiring immediate execution on the FPGA.
RocketSim details and implementation can be found here
Functional tests can be found in the Report and the Final Presentation
From the timing table, the longest delay occurs in the Execute stage of the SUB instruction, with a latency of 6.93 ns. This makes it the critical path of the pipeline and determines the maximum clock frequency.
The CPU clock frequency is limited by the longest stage delay -> 1/6.93 ns
→ Maximum clock frequency: ≈ 144 MHz Clock Frequency of Single cycle version was 100 MHz
Even though the CPI (cycles per instruction) was not experimentally measured, theoretical analysis already allows meaningful conclusions. The single-cycle VeSPA processor runs at 100 MHz, while the pipelined version can operate at 144 MHz due to a shorter critical path. In an ideal scenario (CPI = 1), this represents a 44% increase in instruction throughput.
In practice, pipeline hazards (data and control hazards) introduce occasional stalls and flushes, increasing CPI slightly above 1. However, since the clock frequency is significantly higher, the overall throughput of the pipelined CPU is still superior to the single-cycle implementation. Additionally, the single-cycle CPU suffers CPI > 1 in load/store instructions, while the pipeline can sustain close to one instruction per cycle under normal execution.
Therefore, even without experimental CPI values, it is reasonable to conclude that the pipelined VeSPA CPU offers higher throughput than the single-cycle version.
- Played a major role in designing and implementing the pipelined datapath
- Actively participated in team discussions on data, control, and structural hazard mitigation strategies
- Contributed to the design and development of the Hazard Unit
- Integrated and tested the UART and GPIO peripherals with the CPU
- Assisted in writing the technical documentation and final report
Skills Learned
- Hardware Description with Verilog
- Hardware Simulation & Testing (Vivado)
- Digital Design & Processor Architecture
- Teamwork and Problem Solving





