An enterprise-grade intelligent document processing system built with a robust Spring Boot backend and React frontend. The platform automates the ingestion, validation, and processing of legal documents through a scalable, event-driven architecture powered by Apache Kafka.
- Features
- Architecture
- Technology Stack
- Project Structure
- Getting Started
- Configuration
- Deployment
- API Reference
- Contributing
- License
- Secure Document Ingestion: Upload documents with rigorous format and content validation using the Chain of Responsibility pattern.
- Event-Driven Processing: Asynchronous processing pipeline utilizing Apache Kafka for high scalability and throughput.
- Intelligent Categorization: Automatic document classification (Contracts, Legal Briefs, etc.) using content analysis.
- Real-Time Updates: WebSocket integration providing live progress tracking to the frontend (Observer Pattern).
- Advanced Extraction: Pluggable strategies for extracting text and metadata from PDF and DOCX files.
- Secure Architecture: Complete RBAC system with JWT authentication and Spring Security.
- Observability: Integrated Prometheus metrics and ELK stack logging for production-grade monitoring.
The system follows a reactive microservices-style architecture, decoupling ingestion from processing to handle high loads efficiently.
graph TD
Client[React Frontend] -->|REST/WebSocket| API[Backend API Gateway]
subgraph "Infrastructure"
API -->|Auth/Data| DB[(PostgreSQL)]
API -->|Cache| Redis[(Redis)]
API -->|Publish Events| Kafka{{Apache Kafka}}
end
subgraph "Processing Pipeline"
Kafka -->|Topic: doc-categorizer| Consumer1[Categorizer Service]
Kafka -->|Topic: doc-tokenizer| Consumer2[Tokenizer Service]
Consumer1 -->|Updates| API
Consumer2 -->|Updates| API
end
style Client fill:#61dafb,stroke:#20232a,stroke-width:2px
style API fill:#6db33f,stroke:#20232a,stroke-width:2px,color:white
style Kafka fill:#20232a,stroke:#e91e63,stroke-width:2px,color:white
- Chain of Responsibility: For flexible, ordered validation steps.
- Composite Pattern: To represent complex, hierarchical document structures.
- Observer Pattern: To push real-time status updates to clients.
- Strategy Pattern: To swap extraction logic based on file type (PDF vs DOCX).
- Framework: Spring Boot 3.1.0
- Language: Java 17
- Security: Spring Security, JWT (JJWT)
- Messaging: Spring Kafka
- Persistence: Spring Data JPA, PostgreSQL
- Caching: Redis
- Processing: Apache PDFBox, Apache POI
- Framework: React 18
- Language: TypeScript
- State/Network: Axios, React Hooks
- Styling: CSS Modules / Standard CSS
- Containerization: Docker, Docker Compose
- Orchestration: Kubernetes Manifests included
- Monitoring: Spring Actuator, Prometheus (Micrometer)
- Logging: ELK Stack ready
intelligent-document-processing/
├── backend/ # Spring Boot Application
│ ├── src/main/java/ # Source code
│ └── pom.xml # Maven dependencies
├── frontend/ # React Application
│ ├── src/ # Components and logic
│ └── package.json # NPM dependencies
├── docker/ # Dockerfiles for Prod/Dev
├── k8s/ # Kubernetes deployment configuration
├── scripts/ # Utility scripts (Load testing, etc.)
└── docker-compose.yml # Local development stack- Java 17+
- Node.js 16+ & npm
- Docker & Docker Compose
- Maven 3.6+
The easiest way to run the entire stack is via Docker Compose:
-
Clone the repository
git clone https://github.com/your-org/intelligent-document-processing.git cd intelligent-document-processing -
Start Services
docker-compose up -d
This starts Postgres, Redis, Kafka, Zookeeper, and the Backend API.
Run the supporting services (ensure ports 5432, 6379, 9092 are free):
docker-compose up -d postgres redis kafka zookeepercd backend
mvn clean install
mvn spring-boot:runServer runs at http://localhost:8080
cd frontend
npm install
npm startClient runs at http://localhost:3000
The application uses standard Spring Boot configuration. Key environment variables:
| Variable | Description | Default |
|---|---|---|
SPRING_DATASOURCE_URL |
PostgreSQL URL | jdbc:postgresql://localhost:5432/docdb |
KAFKA_BOOTSTRAP_SERVERS |
Kafka Brokers | localhost:9092 |
REDIS_HOST |
Redis Host | localhost |
JWT_SECRET |
Security Key | Change_Me_In_Production |
APP_CORS_ORIGINS |
Allowed Origins | http://localhost:3000 |
POST /api/auth/register- Create new accountPOST /api/auth/login- Authenticate and receive token
POST /api/documents/upload- Multiplayer file uploadGET /api/documents/{id}/status- Check processing statusGET /api/documents/download/{id}- Retrieve processed file
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.