Skip to content

A real-time log processing system in Node.js, built to handle massive (multi-GB) concurrent file uploads with stable memory usage using streams and worker threads.

Notifications You must be signed in to change notification settings

shwetank-dev/real-time-log-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Log Analysis & Monitoring Dashboard

A high-performance Node.js system that streams and analyzes multiple arbitrarily large log files concurrently in real time while maintaining bounded memory usage.

Built using native Node.js primitives, this project demonstrates systems-level thinking around streaming, backpressure, CPU parallelism, and non-blocking architecture.

👉 Engineering Deep Dive: Read the Engineering Deep Dive


What This Application Does

  • Streams multi-GB log files without buffering them into memory
  • Processes data across multiple CPU cores using worker threads
  • Applies automatic backpressure to stabilize throughput
  • Provides live processing metrics via a real-time dashboard (SSE)
  • Keeps the event loop responsive during CPU-heavy workloads

Why This Project Exists

Many file-processing services load entire files into memory, making them unsuitable for large-scale workloads.

This project explores how far the native Node.js runtime can be pushed to build a truly streaming, CPU-aware system capable of handling arbitrarily large inputs without memory growth.


Architecture Overview

Client Upload
     ↓
HTTP Stream (Native Node)
     ↓
BatchMaker → splits buffers into line batches
     ↓
ErrorCounter → delegates parsing to worker pool
     ↓
Worker Threads → parallel log analysis
     ↓
StatusUpdater → emits progress events
     ↓
Server-Sent Events → Live Dashboard

Key Engineering Highlights

Constant Memory Streaming
Files are processed as a stream and never written to disk or fully buffered.

Worker Thread Pool
CPU-intensive parsing runs off the main thread and scales across available cores.

Backpressure-Aware Pipeline
Streams automatically pause when the worker queue reaches capacity.

Native HTTP Server
Built without frameworks to demonstrate direct mastery of Node.js runtime primitives.

Real-Time Observability
Clients receive live updates on processing progress via Server-Sent Events.


Tech Stack

  • Node.js: v22.12.0
  • Native Node modules only (no backend frameworks)

Requirements

  • Node.js 22.12.0
  • Modern browser with SSE support (Chrome (v144.0.7559.133) recommended)

Verify:

node -v

Getting Started

1. Clone the Repository

git clone https://github.com/shwetank-dev/real-time-log-analyzer.git
cd real-time-log-analyzer

2. Start the Server

node src/server.js

You should see:

Server listening on 13099

3. Open the Dashboard

http://localhost:13099

Create a session and upload a log file to begin real-time analysis.


API Overview

Health Check

GET /health

Create Upload Session

POST /sessions

Returns a unique sessionId.


Upload Log File

POST /upload/:sessionId

Streams the file directly into the processing pipeline.


Real-Time Status Stream

GET /status/:sessionId

Establishes a Server-Sent Events connection for live progress updates.


Observing Memory Behavior

The server logs memory usage periodically:

[MEM] rss=XXMB heapUsed=XXMB heapTotal=XXMB external=XXMB

Memory should remain stable regardless of file size, confirming the streaming design.


Project Structure

src/
 ├── server.js
 ├── sessionManager.js
 ├── worker/
 │    ├── parserWorker.js
 │    └── workerPool.js
 └── transformStreams/
      ├── batchMaker.js
      ├── errorCounter.js
      └── statusUpdater.js

What This Project Demonstrates

  • Deep understanding of Node.js streams
  • Event loop protection via worker threads
  • Concurrency and task scheduling
  • Backpressure management
  • Real-time data delivery
  • Production-style error handling

Future Enhancements

  • Persistent session storage (Redis)
  • Authentication & rate limiting
  • Advanced log classification
  • Containerization and autoscaling

License

MIT

About

A real-time log processing system in Node.js, built to handle massive (multi-GB) concurrent file uploads with stable memory usage using streams and worker threads.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •