A high-performance Node.js system that streams and analyzes multiple arbitrarily large log files concurrently in real time while maintaining bounded memory usage.
Built using native Node.js primitives, this project demonstrates systems-level thinking around streaming, backpressure, CPU parallelism, and non-blocking architecture.
👉 Engineering Deep Dive: Read the Engineering Deep Dive
- Streams multi-GB log files without buffering them into memory
- Processes data across multiple CPU cores using worker threads
- Applies automatic backpressure to stabilize throughput
- Provides live processing metrics via a real-time dashboard (SSE)
- Keeps the event loop responsive during CPU-heavy workloads
Many file-processing services load entire files into memory, making them unsuitable for large-scale workloads.
This project explores how far the native Node.js runtime can be pushed to build a truly streaming, CPU-aware system capable of handling arbitrarily large inputs without memory growth.
Client Upload
↓
HTTP Stream (Native Node)
↓
BatchMaker → splits buffers into line batches
↓
ErrorCounter → delegates parsing to worker pool
↓
Worker Threads → parallel log analysis
↓
StatusUpdater → emits progress events
↓
Server-Sent Events → Live Dashboard
Constant Memory Streaming
Files are processed as a stream and never written to disk or fully buffered.
Worker Thread Pool
CPU-intensive parsing runs off the main thread and scales across available cores.
Backpressure-Aware Pipeline
Streams automatically pause when the worker queue reaches capacity.
Native HTTP Server
Built without frameworks to demonstrate direct mastery of Node.js runtime primitives.
Real-Time Observability
Clients receive live updates on processing progress via Server-Sent Events.
- Node.js: v22.12.0
- Native Node modules only (no backend frameworks)
- Node.js 22.12.0
- Modern browser with SSE support (Chrome (v144.0.7559.133) recommended)
Verify:
node -vgit clone https://github.com/shwetank-dev/real-time-log-analyzer.git
cd real-time-log-analyzernode src/server.jsYou should see:
Server listening on 13099
http://localhost:13099
Create a session and upload a log file to begin real-time analysis.
GET /health
POST /sessions
Returns a unique sessionId.
POST /upload/:sessionId
Streams the file directly into the processing pipeline.
GET /status/:sessionId
Establishes a Server-Sent Events connection for live progress updates.
The server logs memory usage periodically:
[MEM] rss=XXMB heapUsed=XXMB heapTotal=XXMB external=XXMB
Memory should remain stable regardless of file size, confirming the streaming design.
src/
├── server.js
├── sessionManager.js
├── worker/
│ ├── parserWorker.js
│ └── workerPool.js
└── transformStreams/
├── batchMaker.js
├── errorCounter.js
└── statusUpdater.js
- Deep understanding of Node.js streams
- Event loop protection via worker threads
- Concurrency and task scheduling
- Backpressure management
- Real-time data delivery
- Production-style error handling
- Persistent session storage (Redis)
- Authentication & rate limiting
- Advanced log classification
- Containerization and autoscaling
MIT