Build and Benchmarking Suite for Deterministic Real-Time Systems
Award-winning project: CATERPILLAR TECH CHALLENGE 2025 Winners
- Overview
- System Specifications
- Quick Start
- Components
- Installation & Setup
- Running Benchmarks
- Performance Results
- Troubleshooting
- Project Structure
This repository provides a complete framework for building and validating a PREEMPT_RT real-time kernel on Raspberry Pi 5, along with a comprehensive benchmarking suite to measure and characterize deterministic system behavior.
The RTOS kernel enables:
- Microsecond-level latency (<200 µs under full stress)
- Deterministic scheduling with bounded jitter
- Suitable for real-time AI/ML workloads including monocular depth estimation, robotics, and safety-critical edge computing
✅ Custom PREEMPT_RT kernel (v6.15.y branch, native compilation on Pi 5)
✅ Multi-scenario cyclictest benchmarking (idle, light, moderate, heavy, thermal stress)
✅ End-to-end latency measurement for inference pipelines
✅ Comprehensive statistical analysis (percentiles, jitter, WCET)
✅ Thermal and power consumption tracking
✅ CPU isolation and frequency scaling support
| Parameter | Value |
|---|---|
| Operating System | Raspberry Pi OS (64-bit), Debian Bookworm |
| Kernel Version | 6.15.0-rc7-v8-16k-NTP+ |
| Architecture | aarch64 (64-bit) |
| Build Method | Native compilation on Raspberry Pi 5 |
| Real-Time Model | PREEMPT_RT (Full Real-Time Preemption) |
| Configuration | Setting | Purpose |
|---|---|---|
CONFIG_PREEMPT_RT |
y |
Full kernel preemption for real-time scheduling |
CONFIG_HZ_1000 |
y |
High-resolution timer (1000 Hz tick rate) |
CONFIG_NO_HZ_FULL |
y |
Tickless kernel on isolated cores |
CONFIG_NTP_PPS |
y |
Kernel PPS (Pulse Per Second) timing support |
CONFIG_PPS_CLIENT_GPIO |
y |
GPIO-based precise time synchronization |
| CPU Governor | performance |
Disabled frequency scaling during tests |
| Component | Specification |
|---|---|
| CPU | Raspberry Pi 5 (4-core ARM Cortex-A76 @ 2.4 GHz) |
| RAM | 8GB LPDDR4X-4267 SDRAM (minimum) |
| Storage | 64GB microSD card (minimum) |
| Cooling | Official Raspberry Pi Active Cooler (recommended) |
| Power | 27W USB-C PSU (Pi 5 recommended supply) |
| Optional | 128GB USB 3.2 for logging/datasets, GPIO test hardware |
chmod +x build_rt_kernel.sh
./build_rt_kernel.shThe script handles:
- Dependency installation
- Kernel source cloning
- Configuration & compilation
- Boot directory setup
- System reboot into RT kernel
⏱️ Estimated time: 45–90 minutes (native compilation on Pi 5)
Follow the step-by-step instructions in Kernel Build Process.
Purpose: Compile a PREEMPT_RT kernel optimized for Raspberry Pi 5.
Files:
build_rt_kernel.sh— Automated build scriptREADME.md— This file (detailed instructions)
Key Optimizations:
-O3compiler optimization +-march=nativefor Pi 5 CPU features-j6parallel compilation (1.5× CPU cores for stability)- Native compilation avoids cross-compilation overhead
- Device tree blobs (DTBs) tailored for BCM2712 (Pi 5 SoC)
Output:
- RT kernel binary:
/boot/firmware/kernel_2712-NTP.img - Modules:
/lib/modules/$(uname -r)/ - Device trees:
/boot/firmware/NTP/
Purpose: Comprehensive validation of real-time determinism under multiple load scenarios.
File: Enhanced_rtos_benchmark_v2.1.sh
What It Measures:
| Metric | Tool | Purpose |
|---|---|---|
| Scheduling Latency | cyclictest |
RT timer interrupt response time under load |
| Thermal Profile | vcgencmd |
Temperature and throttling behavior |
| System Stress | stress-ng |
CPU, memory, I/O load simulation |
| Jitter Distribution | Statistical analysis | Percentile latencies (50th–99.99th) |
| Power Consumption | VCM monitoring | Voltage, frequency, throttle events |
Benchmark Scenarios:
- Idle — No load, baseline performance (~2 µs latency)
- Light — 1 CPU core + light I/O (~1 µs latency)
- Moderate — 2 CPU cores + medium I/O (~5–10 µs latency)
- Heavy — 3 CPU cores + high memory pressure + I/O (~50–100 µs latency)
- Thermal — Sustained load to trigger thermal throttling
Output Structure:
rtos_benchmark_YYYYMMDD_HHMMSS/
├── system_info.txt # Hardware/kernel configuration
├── thermal_power_log.csv # Continuous monitoring (5s intervals)
├── power_thermal_stats.txt # Power analysis & throttle events
├── statistical_summary.txt # Complete percentile statistics
├── statistical_summary.json # Machine-readable results
├── cyclictest_idle/
│ ├── cyclictest.json
│ ├── cyclictest_raw.txt
│ ├── min_latency.txt
│ ├── avg_latency.txt
│ └── max_latency.txt
├── cyclictest_light/
├── cyclictest_moderate/
├── cyclictest_heavy/
└── cyclictest_thermal/
Key Features:
- ✅ Pre-flight validation (RT kernel, tools, CPU isolation)
- ✅ Background thermal monitoring (continuous)
- ✅ Per-scenario stress load management
- ✅ Configurable test duration (default: 600s)
- ✅ JSON export for automated analysis
Purpose: Measure complete latency pipeline for AI/ML inference (e.g., depth estimation).
File: e2e_inference_benchmark.py
What It Measures:
Frame Capture → Preprocessing → Inference → Postprocessing → Decision
↓ ↓ ↓ ↓ ↓
tflite runtime + OpenCV benchmarking
↓
Total E2E Latency + Component Breakdown
Breakdown Components:
- Preprocessing: Frame crop, resize, normalization (1–5 µs)
- Inference: TFLite model inference on CPU (50–200 µs)
- Postprocessing: Depth alignment, ROI extraction, threshold decision (5–10 µs)
Configuration:
MODEL_PATH = "ADALITE_TFLITE.tflite"
MODEL_INPUT_HEIGHT = 256
MODEL_INPUT_WIDTH = 256
SAMPLE_COUNT = 1000Output:
e2e_inference_latency.csv— Per-sample breakdown (Sample, Total, Preproc, Inference, Postproc)e2e_inference_stats.txt— Statistical summary with percentiles & throughput
Example Usage:
python3 e2e_inference_benchmark.py
# Outputs: e2e_inference_latency.csv, e2e_inference_stats.txtHardware:
- Raspberry Pi 5 (8GB RAM minimum)
- 64GB microSD with Raspberry Pi OS (Bookworm, 64-bit)
- Active cooling (fan or heatsink)
Software Dependencies (Auto-installed by scripts):
# Kernel build dependencies
git bc bison flex libssl-dev make libncurses5-dev raspberrypi-kernel-headers
# Benchmarking dependencies
rt-tests # Contains cyclictest
stress-ng # Load generation
python3 # Analysis scripts
python3-opencv # Frame processing (for e2e_inference_benchmark.py)
tflite-runtime # For inference benchmarkingsudo apt update && sudo apt upgrade -y
# Install essential build tools
sudo apt install -y git bc bison flex libssl-dev make libncurses5-dev
# Benchmarking tools
sudo apt install -y rt-tests stress-ng
# Python dependencies
pip3 install --break-system-packages \
numpy opencv-python tflite-runtime pandas# Clone from GitHub (if available)
git clone https://github.com/ShekharShwetank/RTOS.git
cd RTOS
# Or extract if provided as ZIP
unzip RTOS.zip && cd RTOSchmod +x build_rt_kernel.sh
./build_rt_kernel.shWhat Happens:
- Downloads Raspberry Pi Linux v6.15.y
- Applies BCM2712 (Pi 5) default configuration
- Prompts for manual config (or use defaults)
- Compiles kernel with
-O3 -march=native -j6 - Installs modules
- Copies kernel + DTBs to
/boot/firmware/NTP/ - Reboots into RT kernel
Troubleshooting:
- If
menuconfigappears: press Escape → Save → Exit (to use defaults) - Build fails? Ensure
/boot/firmware/has >500MB free - Reboot hangs? Hold Ctrl+C, insert old SD, rebuild
After reboot:
uname -a
# Expected: ...PREEMPT_RT...
# Check RT config
cat /boot/config-$(uname -r) | grep CONFIG_PREEMPT_RT
# Expected: CONFIG_PREEMPT_RT=y
# Check timer frequency
cat /boot/config-$(uname -r) | grep CONFIG_HZ
# Expected: CONFIG_HZ_1000=y./Enhanced_rtos_benchmark_v2.1.sh [DURATION_SECONDS]Examples:
# Default: 600 seconds (10 minutes)
./Enhanced_rtos_benchmark_v2.1.sh
# Extended run: 3600 seconds (1 hour) for statistical significance
./Enhanced_rtos_benchmark_v2.1.sh 3600
# Quick test: 300 seconds (5 minutes)
./Enhanced_rtos_benchmark_v2.1.sh 300Output:
Enhanced RTOS Benchmarking Suite v2.1
✓ PREEMPT_RT kernel detected
✓ Pre-flight checks complete
[1/7] Capturing System Baseline...
[2/7] Starting System Health Monitoring...
[3/7] Preparing GPIO End-to-End Latency Test...
[4/7] Running Multi-Scenario Cyclictest...
[5/7] Power Consumption Analysis...
[6/7] Computing Jitter and Percentile Statistics...
[7/7] Generating Final Report...
Results saved in: rtos_benchmark_*/
Interpreting Results:
# View statistical summary
cat rtos_benchmark_*/statistical_summary.txt
# View thermal profile
cat rtos_benchmark_*/power_thermal_stats.txt
# Analyze per-scenario latencies
cat rtos_benchmark_*/cyclictest_idle/min_latency.txt
cat rtos_benchmark_*/cyclictest_heavy/max_latency.txtpython3 e2e_inference_benchmark.pyPrerequisites:
# Ensure TFLite model is in current directory
ls -lh ADALITE_TFLITE.tflite
# Or provide test video
ls -lh input_road.mp4Output Example:
End-to-End ADALITE Inference Latency Benchmark
============================================================
Model: ADALITE_TFLITE.tflite
Samples: 1000
Resolution: 256x256
Metric Total Preproc Inference Postproc
------------------------------------------------------
Mean 95.32 μs 2.15 μs 89.42 μs 3.75 μs
99th %ile 156.00 μs 5.20 μs 148.30 μs 8.90 μs
99.9th %ile 201.00 μs 8.10 μs 195.20 μs 12.50 μs
✓ Detailed statistics saved to: e2e_inference_stats.txt
✓ Raw data saved to: e2e_inference_latency.csv
| Scenario | Mean (µs) | Median (µs) | 95th %ile (µs) | 99th %ile (µs) | 99.9th %ile (µs) | WCET (µs) | Jitter (µs) |
|---|---|---|---|---|---|---|---|
| Idle | 2.01 | 2.00 | 2.00 | 3.00 | 4.00 | 16.00 | 0.20 |
| Light | 1.02 | 1.00 | 1.00 | 2.00 | 4.00 | 23.00 | 0.23 |
| Moderate | 1.16 | 1.00 | 2.00 | 3.00 | 9.00 | 109.00 | 0.79 |
| Heavy | 1.34 | 1.00 | 3.00 | 5.00 | 11.00 | 76.00 | 1.01 |
| Thermal | 1.55 | 2.00 | 2.00 | 2.00 | 5.00 | 20.00 | 0.55 |
Table Notes:
- WCET: Worst-Case Execution Time (maximum observed latency)
- Jitter: Standard deviation of scheduling latencies
- All measurements conducted on Raspberry Pi 5 with PREEMPT_RT Linux kernel v6.15
- Each scenario tested with 3–6 million samples
| Metric | Latency (µs) | Latency (ms) |
|---|---|---|
| Mean | 116,436 | 116.4 |
| Median | 115,899 | 115.9 |
| 95th percentile | 118,885 | 118.9 |
| 99th percentile | 140,788 | 140.8 |
| 99.9th percentile | 192,077 | 192.1 |
| Maximum (WCET) | 221,341 | 221.3 |
Component Breakdown (Mean):
| Component | Latency (µs) | Latency (ms) | % of Total |
|---|---|---|---|
| Preprocessing | 5,066 | 5.1 | 4.4% |
| Inference | 111,076 | 111.1 | 95.4% |
| Postprocessing | 282 | 0.3 | 0.2% |
Table Notes:
- Measured over 1,000 samples processing KITTI road scenes
- Inference component dominates at 95.4% of total latency
- End-to-end latency suitable for real-time robotics and autonomous systems
uname -a
# Should show: PREEMPT_RT
# If not shown:
cat /boot/config-$(uname -r) | grep CONFIG_PREEMPT_RT
# Should output: CONFIG_PREEMPT_RT=ySolution:
- Verify boot configuration:
cat /boot/firmware/config.txt - Check kernel copy:
ls -lh /boot/firmware/kernel_2712-NTP.img - Rebuild if needed:
./build_rt_kernel.sh
Causes: Frequency scaling, CPU interrupts, kernel debugging
Solutions:
# Check frequency governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Should be: performance
# Force performance governor if needed
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Verify CPU isolation
cat /sys/devices/system/cpu/isolated
# Should include: 3 (isolated core)
# Add to /boot/cmdline.txt if missing:
# isolcpus=3 nohz_full=3 rcu_nocbs=3# Install missing dependencies
sudo apt install -y rt-tests stress-ng python3 python3-pip
# Install Python packages
pip3 install --break-system-packages \
numpy pandas opencv-python tflite-runtimeThrottle Events: 12 (ARM frequency capped)
Solutions:
- Attach active cooler to Pi 5
- Improve airflow (case ventilation)
- Reduce test duration temporarily
- Disable heavy load scenarios (thermal test)
RTOS/
├── README.md # This file
├── build_rt_kernel.sh # Automated kernel build & deployment
├── Enhanced_rtos_benchmark_v2.1.sh # Multi-scenario benchmarking suite
├── e2e_inference_benchmark.py # End-to-end inference latency measurement
├── benchmarking/ # Example benchmark results
│ ├── system_info.txt
│ ├── statistical_summary.txt
│ ├── statistical_summary.json
│ ├── power_thermal_stats.txt
│ ├── cyclictest_idle/
│ ├── cyclictest_light/
│ ├── cyclictest_moderate/
│ ├── cyclictest_heavy/
│ ├── cyclictest_thermal/
│ └── publication_results/
│ ├── ANALYSIS_SUMMARY.txt
│ ├── figures/ # PNG/PDF plots
│ └── latex_tables/ # Publication-ready LaTeX tables
├── assets/ # Supporting files (if any)
└── .git/ # Version control
To dedicate core 3 entirely to RT tasks:
Edit /boot/cmdline.txt and add:
isolcpus=3 nohz_full=3 rcu_nocbs=3 kthread_cpus=0-2 irqaffinity=0-2
Then rebuild/reboot. This prevents:
- Kernel threads from running on core 3
- IRQ handling on core 3
- Timer ticks on core 3
Expected Latency Improvement: 5–15% reduction in jitter under stress
For synchronized clock with external PPS source:
# Install PPS tools
sudo apt install pps-tools gpsd
# Connect GPIO pin 17 to PPS source
# Verify detection
sudo ppstest /dev/pps0Then configure NTP:
# Edit /etc/ntp.conf
# Add: server 127.127.8.0 minpoll 4 maxpoll 4| Tuning | Expected Gain | Difficulty |
|---|---|---|
| CPU isolation (isolcpus) | 5–15% lower latency | Easy |
| Performance governor | 10% lower latency | Easy |
| Disable USB hub scanning | 5% lower jitter | Easy |
| Move IRQs off core | 2–5% improvement | Medium |
| Disable unused CPUs | 3% lower idle latency | Medium |
| Custom cyclictest priority | Varies | Hard |
- Linux Kernel Documentation: https://www.kernel.org/doc/html/latest/
- RT-Tests (cyclictest): https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/rt-tests
- Raspberry Pi Linux: https://github.com/raspberrypi/linux
- PREEMPT_RT Wiki: https://rt.wiki.kernel.org/
Contributions, bug reports, and performance improvements are welcome. Submit via:
- GitHub Issues/PRs
- Email: shwetankshekharcode@gmail.com
This project documentation and scripts are provided as-is for research and educational purposes.
Caterpillar Tech Challenge 2025 Winners — Complete RTOS implementation and validation framework for real-time edge AI on embedded systems.



