🎯 DocClassify

Advanced AI-Powered Document Classification Platform

Features • Demo • Installation • Usage • API • Contributing

🚀 An intelligent document classification platform that leverages machine learning to categorize documents with high accuracy and modern web interfaces

📋 Table of Contents

Overview
Features
Demo
Tech Stack
Installation
Usage
API Documentation
Project Structure
Model Training
Configuration
Contributing
License
Authors
Acknowledgments

🌟 Overview

DocClassify is a cutting-edge document classification platform that uses Support Vector Machine (SVM) with TF-IDF vectorization to categorize documents with high accuracy. Designed for businesses, researchers, and organizations, it combines sophisticated ML algorithms with a stunning, modern user interface.

Why Choose DocClassify?

Feature	Benefit
🎯 High Accuracy	SVM model with TF-IDF achieves reliable document categorization
⚡ Real-time Processing	Instant classification with confidence scoring
📄 Multi-format Support	Handles PDF, DOCX, and TXT files seamlessly
🎨 Modern UI/UX	Clean design with smooth animations and responsive layout
🔧 FastAPI Backend	Scalable and fast API for document processing
📊 Confidence Metrics	Detailed classification results with probability scores

✨ Features

🧠 Intelligent Core

SVM Classification

Support Vector Machine algorithm
TF-IDF vectorization for text features
Label encoding for categories
Confidence scoring system

Smart Analytics

Real-time classification results
Confidence level assessment
Processing time metrics
File format detection

🎨 Visual Experience

Design Excellence

Clean Interface: Modern design with intuitive navigation
Dark Theme: Professional aesthetic with high contrast
Responsive Layout: Mobile to desktop support
Smooth Animations: Framer Motion transitions

Interactive Elements

File Upload: Drag & drop interface
Progress Indicators: Real-time processing feedback
Results Display: Clear classification output
Error Handling: User-friendly error messages

📄 Document Processing

Multi-format Support: PDF, DOCX, TXT file processing
Text Extraction: Advanced parsing for different formats
Preprocessing Pipeline: Cleaning, tokenization, lemmatization
Batch Processing: Single file classification

🎥 Demo

Classification Interface

Results Dashboard

File Upload Interface

🛠 Tech Stack

Frontend Technologies

Technology	Purpose	Version
	UI Framework	19.x
	Build Tool	Latest
	Styling	Latest
	Animations	Latest
	HTTP Client	Latest
	Routing	Latest

Backend Technologies

Technology	Purpose	Version
	Language	3.9+
	API Framework	Latest
	ML Library	Latest
	NLP Processing	Latest
	Data Processing	Latest
	PDF Processing	Latest

📦 Installation

Prerequisites

Before you begin, ensure you have the following installed:

# Check Node.js version (v18 or higher required)
node --version

# Check Python version (3.9 or higher required)
python --version

# Check Git
git --version

Required Software:

Node.js (v18+)
Python (3.9+)
Git

Quick Start Guide

Step 1: Clone the Repository

git clone https://github.com/yourusername/docclassify.git
cd docclassify

Step 2: Frontend Setup

# Navigate to client directory
cd client

# Install dependencies
npm install

# Start development server
npm run dev

✅ Frontend will be running at http://localhost:5173

Step 3: Backend Setup

Open a new terminal window:

# Navigate to server directory
cd server

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start FastAPI server
python app.py

✅ Backend API will be running at http://localhost:8000

Step 4: Verify Installation

Visit http://localhost:5173 in your browser. You should see the DocClassify landing page.

Alternative: Concurrent Execution

Run both frontend and backend simultaneously with a single command:

# From root directory
npm install
npm run dev

This requires the root package.json to have concurrently configured.

🚀 Usage

Making Your First Classification

1. Navigate to Classification Page

Open your browser and go to http://localhost:5173/

2. Upload a Document

Upload a document file:

Supported Formats:
├── PDF files (.pdf)
├── Word documents (.docx)
├── Text files (.txt)
└── Maximum file size: 10MB

3. Process & Analyze

Click "Classify Document" to receive:

✅ Document Category (Predicted class)
📊 Confidence Score (0-100%)
🎯 Classification Details
⏱️ Processing Time

📡 API Documentation

Base URL

http://localhost:8000

Endpoints

1. Health Check

GET /

Response:

{
  "status": "healthy",
  "service": "Document Classification API"
}

2. Classify Document

POST /classify
Content-Type: multipart/form-data

Request Body:

file: [UploadFile] - The document file to classify

Response:

{
  "filename": "document.pdf",
  "prediction": "Category A",
  "confidence": 0.87,
  "message": "Classification successful"
}

Error Responses

{
  "detail": "Error message here",
  "status_code": 400
}

Common Status Codes:

200 - Success
400 - Bad Request (Invalid file or processing error)
500 - Internal Server Error

📂 Project Structure

docclassify/
│
├── 📁 client/                          # React Frontend
│   ├── 📁 public/                      # Static assets
│   │   ├── favicon.ico
│   │   └── logo.png
│   │
│   ├── 📁 src/
│   │   ├── 📁 components/              # Reusable components
│   │   │   ├── Footer.jsx              # Footer component
│   │   │   ├── Navbar.jsx              # Navigation bar
│   │   │   └── [Other components]
│   │   │
│   │   ├── 📁 pages/                   # Page components
│   │   │   ├── Home.jsx                # Home page
│   │   │   ├── Layout.jsx              # Layout wrapper
│   │   │   └── NotFound.jsx            # 404 page
│   │   │
│   │   ├── 📁 hooks/                   # Custom React hooks
│   │   │   └── [Custom hooks]
│   │   │
│   │   ├── 📁 utils/                   # Utility functions
│   │   │   └── [Utility files]
│   │   │
│   │   ├── App.jsx                     # Main app component
│   │   ├── main.jsx                    # Entry point
│   │   └── [Other files]
│   │
│   ├── package.json                    # Frontend dependencies
│   ├── vite.config.js                  # Vite configuration
│   └── [Other config files]
│
├── 📁 server/                          # FastAPI Backend
│   ├── 📁 models/                      # Trained ML models
│   │   ├── svm_model.pkl               # SVM model
│   │   ├── tfidf_vectorizer.pkl        # TF-IDF vectorizer
│   │   └── label_encoder.pkl           # Label encoder
│   │
│   ├── 📁 training/                    # ML training scripts
│   │   ├── eda.py                      # Exploratory data analysis
│   │   ├── preprocessing.py            # Data preprocessing
│   │   ├── model.py                    # Model training
│   │   └── test.py                     # Model testing
│   │
│   ├── app.py                          # Main FastAPI app
│   ├── requirements.txt                # Python dependencies
│   └── [Other files]
│
├── 📁 docs/                            # Documentation
│   ├── API.md                          # API documentation
│   └── [Other docs]
│
├── .gitignore                          # Git ignore rules
├── LICENSE                             # MIT License
├── README.md                           # This file
└── package.json                        # Root package (scripts)

🧪 Model Training

Training Your SVM Model

Prepare Your Dataset

Ensure your CSV file has the following columns:

text, category

Run Training Scripts

cd server/training

# Perform EDA
python eda.py --dataset ../data/your_dataset.csv

# Preprocess data
python preprocessing.py --input ../data/your_dataset.csv --output ../data/processed_data.csv

# Train model
python model.py --dataset ../data/processed_data.csv --output ../models/

# Test model
python test.py --model ../models/svm_model.pkl --data ../data/test_data.csv

Available Parameters

# EDA
--dataset       # Path to training data CSV

# Preprocessing
--input         # Input CSV file
--output        # Output processed CSV file

# Model Training
--dataset       # Path to processed data CSV
--output        # Directory to save trained models
--test-size     # Test split ratio (default: 0.2)

# Testing
--model         # Path to trained model
--data          # Path to test data

Model Performance Metrics

Current SVM Performance

Metric	Value
Accuracy	85.2%
Precision	83.1%
Recall	84.5%
F1-Score	83.8%

⚙️ Configuration

Frontend Configuration

API Endpoint (`client/src/utils/api.js`)

const API_BASE_URL = import.meta.env.VITE_API_URL || 'http://localhost:8000';

Backend Configuration

Server Config (`server/app.py`)

# API Configuration
HOST = "0.0.0.0"
PORT = 8000

# Model Configuration
MODEL_PATH = "models/svm_model.pkl"
VECTORIZER_PATH = "models/tfidf_vectorizer.pkl"
ENCODER_PATH = "models/label_encoder.pkl"

# CORS Settings
ALLOWED_ORIGINS = ["http://localhost:5173"]

Environment Variables

Create .env files:

Frontend (.env):

VITE_API_URL=http://localhost:8000

Backend (.env):

DEBUG=True
MODEL_PATH=./models

🎨 Customization

Changing Color Scheme

Update Theme Variables

Edit client/src/styles/theme.css:

:root {
  /* Primary Colors */
  --primary: #6366f1;
  --secondary: #8b5cf6;
  --accent: #ec4899;

  /* Background */
  --background: #0f0f0f;
  --surface: #1a1a1a;

  /* Text */
  --text-primary: #ffffff;
  --text-secondary: #a0a0b0;
}

Adjusting Model Parameters

Edit server/training/model.py:

# SVM Configuration
C = 1.0                    # Regularization parameter
kernel = 'linear'          # Kernel type
gamma = 'scale'            # Kernel coefficient

# TF-IDF Configuration
max_features = 5000        # Maximum features
ngram_range = (1, 2)       # N-gram range

🤝 Contributing

We welcome contributions from the community! Here's how you can help:

Getting Started

Fork the Repository
```
# Click the 'Fork' button on GitHub
```

Clone Your Fork

git clone https://github.com/YOUR_USERNAME/docclassify.git
cd docclassify

Create a Branch
```
git checkout -b feature/AmazingFeature
```
Make Your Changes
- Write clean, documented code
- Follow existing code style
- Add tests if applicable

Commit Your Changes

git add .
git commit -m 'Add some AmazingFeature'

Push to Your Fork
```
git push origin feature/AmazingFeature
```
Open a Pull Request
- Go to the original repository
- Click 'New Pull Request'
- Describe your changes

Development Guidelines

✅ Follow the existing code style and conventions
✅ Write meaningful commit messages
✅ Add comments for complex logic
✅ Update documentation as needed
✅ Test your changes thoroughly
✅ Ensure all tests pass before submitting PR

Code Style

JavaScript/React:

Use functional components with hooks
Follow Airbnb JavaScript Style Guide
Use meaningful variable names
Add JSDoc comments for functions

Python:

Follow PEP 8 style guide
Use type hints
Add docstrings to functions
Keep functions small and focused

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 Nimit Gupta & Sanjeevni Dhir

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

👨‍💻 Authors

_{Nimit Gupta}
Lead Developer

_{Sanjeevni Dhir}
Co-Developer

🙏 Acknowledgments

We would like to thank the following projects and communities:

Scikit-learn - Machine learning algorithms and utilities
NLTK - Natural language processing toolkit
React - UI library for building interactive interfaces
FastAPI - Modern, fast web framework for Python
Tailwind CSS - Utility-first CSS framework
Framer Motion - Production-ready animation library
Vite - Next generation frontend tooling

Special Thanks

The open-source community for incredible tools and libraries
Contributors who help improve this project
Users who provide valuable feedback

📊 Roadmap

Upcoming Features

Long-term Vision

Enterprise document management platform
Integration with cloud storage services
Advanced NLP features (sentiment analysis, keyword extraction)
Real-time collaboration features
Custom model training interface
API marketplace for document processing

📞 Support

Getting Help

If you need help with DocClassify, here are your options:

📧 Email: Contact the development team directly
- Nimit: guptanimit062@gmail.com
- Sanjeevni: sanjeevnidhir05@gmail.com
🐛 Bug Reports: Open an issue on GitHub Issues
💡 Feature Requests: Submit your ideas on GitHub Discussions
📖 Documentation: Check out our comprehensive guides in the /docs folder

FAQ

Q: What file formats are supported?
A: DocClassify currently supports PDF, DOCX, and TXT files.

Q: What's the maximum file size?
A: The current limit is 10MB per file, but this can be configured.

Q: How accurate is the classification?
A: Our SVM model achieves around 85% accuracy, but this depends on your training data.

Q: Can I train my own model?
A: Yes, use the training scripts in the server/training/ directory.

Q: Is my data secure?
A: Files are processed locally and not stored permanently. For production use, implement proper security measures.

🌐 Community

Join our growing community of developers and contributors:

⭐ Star this repo if you find it helpful
🍴 Fork and contribute to make it better
📣 Share with others who might benefit
💬 Join discussions to share ideas and feedback

📈 Project Stats

Made with ❤️ by Nimit Gupta & Sanjeevni Dhir

DocClassify - Transforming Document Management with AI

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
client		client
server		server
.gitignore		.gitignore
Readme.md		Readme.md
package.json		package.json
test_doc.txt		test_doc.txt

sanju234-san/Docs_classification

Folders and files

Latest commit

History

Repository files navigation