A state-of-the-art deepfake detection system built with PyTorch and EfficientNet-B0, featuring a user-friendly web interface for real-time image and video analysis.
- π¨βπ» T RAHUL SINGH
- π§βπ» Mallikarjun Macherla
- π§βπ» Prakash Madasu
- Deep Learning Model: EfficientNet-B0 architecture fine-tuned for deepfake detection
- Multi-format Support: Analyze both images (.jpg, .jpeg, .png) and videos (.mp4, .mov)
- Web Interface: Interactive Gradio-based web application for easy testing
- Real-time Analysis: Process first frame of videos for quick deepfake detection
- Training Pipeline: Complete PyTorch Lightning training infrastructure
- Model Export: Support for PyTorch (.pt) and ONNX format exports
- Python 3.8 or higher
- CUDA-compatible GPU (optional, but recommended for training)
- 
Clone the repository: git clone https://github.com/TRahulsingh/DeepfakeDetector.git cd DeepfakeDetector
- 
Install dependencies: pip install -r requirements.txt 
- 
Download a pre-trained model (or train your own): - Place your model file as models/best_model-v3.pt
 
- Place your model file as 
Launch the interactive web interface:
python web-app.pyThe web app will open in your browser where you can:
- Drag and drop images or videos
- View real-time predictions with confidence scores
- See preview of analyzed content
Classify individual images:
python classify.py --image path/to/your/image.jpgProcess videos frame by frame:
python inference/video_inference.py --video path/to/your/video.mp4This deepfake detection system supports various popular deepfake datasets. Below are the recommended datasets for training and evaluation:
- Description: One of the most comprehensive deepfake datasets with 4 manipulation methods
- Size: ~1,000 original videos, ~4,000 manipulated videos
- Manipulations: Deepfakes, Face2Face, FaceSwap, NeuralTextures
- Quality: Raw, c23 (light compression), c40 (heavy compression)
- Download: GitHub Repository
- Usage: Excellent for training robust models across different manipulation types
- Description: High-quality celebrity deepfake dataset
- Size: 590 real videos, 5,639 deepfake videos
- Quality: High-resolution with improved visual quality
- Download: Official Website
- Usage: Great for testing model performance on high-quality deepfakes
- Description: Facebook's large-scale deepfake detection dataset
- Size: ~100,000 videos (real and fake)
- Diversity: Multiple actors, ethnicities, and ages
- Download: Kaggle Competition
- Usage: Large-scale training and benchmarking
- Description: Google/Jigsaw deepfake dataset
- Size: ~3,000 deepfake videos
- Quality: High-quality with various compression levels
- Download: FaceForensics++ repository
- Usage: Additional training data for model robustness
- Description: Large collection of real and AI-generated face images
- Size: ~140,000 images
- Source: StyleGAN-generated faces vs real faces
- Download: Kaggle Dataset
- Usage: Perfect for image-based deepfake detection training
- Description: High-quality celebrity face dataset
- Size: 30,000 high-resolution images
- Quality: 1024Γ1024 resolution
- Download: GitHub Repository
- Usage: Real face examples for training
- Download your chosen dataset from the links above
- Extract to the data/folder
- Organize as shown in the training section below
Use our built-in tools to prepare datasets:
# Split video dataset into frames
python tools/split_video_dataset.py --input_dir raw_videos --output_dir data
# Split dataset into train/validation
python tools/split_train_val.py --input_dir data --train_ratio 0.8
# General dataset splitting
python tools/split_dataset.py --input_dir your_dataset --output_dir data- For Beginners: Start with 140k Real and Fake Faces (image-based, easy to work with)
- For Research: Use FaceForensics++ (comprehensive, multiple manipulation types)
- For Production: Combine DFDC + Celeb-DF (large scale, diverse)
- For High-Quality Testing: Use Celeb-DF v2 (challenging, high-quality deepfakes)
- Ethical Use: These datasets are for research purposes only
- Legal Compliance: Ensure compliance with dataset licenses and terms of use
- Privacy: Respect privacy rights of individuals in the datasets
- Citation: Properly cite the original dataset papers when publishing research
Organize your training data in the data folder as follows:
data/
βββ train/
β   βββ real/
β   β   βββ image1.jpg
β   β   βββ image2.jpg
β   βββ fake/
β       βββ fake1.jpg
β       βββ fake2.jpg
βββ validation/
    βββ real/
    βββ fake/
Update config.yaml with your dataset paths:
train_paths:
  - data/train
val_paths:
  - data/validation
lr: 0.0001
batch_size: 4
num_epochs: 10python main_trainer.pyor
python model_trainer.pyThe training will:
- Use PyTorch Lightning for efficient training
- Save best model based on validation loss
- Log metrics to TensorBoard
- Apply early stopping to prevent overfitting
View training progress with TensorBoard:
tensorboard --logdir lightning_logsβββ web-app.py                    # Main web application
βββ main_trainer.py               # Primary training script
βββ classify.py                   # Image classification utility
βββ realeval.py                   # Real-world evaluation script
βββ config.yaml                   # Training configuration
βββ requirements.txt              # Python dependencies
βββ README.md                     # Project documentation
βββ LICENSE                       # MIT License
βββ .gitignore                    # Git ignore rules
βββ data/                         # Dataset storage (not tracked by git)
β   βββ train/                    # Training data
β   βββ validation/               # Validation data
βββ datasets/
β   βββ hybrid_loader.py          # Custom dataset loader
βββ lightning_modules/
β   βββ detector.py               # PyTorch Lightning module
βββ models/
β   βββ best_model-v3.pt          # Trained model weights
βββ tools/                        # Dataset preparation utilities
β   βββ split_dataset.py
β   βββ split_train_val.py
β   βββ split_video_dataset.py
βββ inference/
    βββ export_onnx.py            # ONNX export
    βββ video_inference.py        # Video processing
- Backbone: EfficientNet-B0 (pre-trained on ImageNet)
- Classifier: Custom 2-class classifier with dropout (0.4)
- Input Size: 224x224 RGB images
- Output: Binary classification (Real/Fake) with confidence scores
The model achieves:
- High accuracy on diverse deepfake datasets
- Real-time inference capabilities
- Robust performance on compressed/low-quality media
Convert PyTorch model to ONNX format:
python inference/export_onnx.pyProcess multiple files programmatically:
from web-app import predict_file
results = []
for file_path in image_paths:
    prediction, confidence, preview = predict_file(file_path)
    results.append({
        'file': file_path,
        'prediction': prediction,
        'confidence': confidence
    })- Fork the repository
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
- EfficientNet architecture by Google Research
- PyTorch Lightning for training infrastructure
- Gradio for web interface framework
- The research community for deepfake detection advances
This project is licensed under the MIT License.
β Star this repository if you found it helpful!