████████╗ █████╗ ██████╗ ██╗████████╗
╚══██╔══╝██╔══██╗██╔════╝ ██║╚══██╔══╝
██║ ███████║██║ ███╗██║ ██║
██║ ██╔══██║██║ ██║██║ ██║
██║ ██║ ██║╚██████╔╝██║ ██║
╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝
TAGIT — Smart Labels for Smart Money
A hybrid AI system that classifies financial transactions using TF‑IDF + Logistic Regression, DistilBERT Transformers, and a clean Streamlit UI.
TAGIT intelligently categorizes messy transaction strings like:
"UPI/ROHAN@OKHDFC/9823"
"AMZN MUMBAI 4093"
"POS 42342 CAFE COFFEE DAY"
"ZOMATO*ONLINE ORDER"
"HPCL/FUEL/PUNE"
It uses a two‑stage hybrid pipeline:
- Baseline Model (Fast): TF‑IDF + Logistic Regression
- Transformer Model (Accurate): DistilBERT + Tabular Features
- Hybrid Router: If baseline is confident → use baseline, else fallback to powerful Transformer
TAGIT also includes a sleek Streamlit interface for real-time testing and CSV batch predictions.
┌────────────────────────────┐
│ RAW INPUT │
│ (UPI / POS / CARD / etc.) │
└────────────────────────────┘
│
▼
┌────────────────────────────┐
│ PREPROCESSOR │
│ Clean text, numbers, dates │
│ Extract merchant token │
└────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌────────────────────┐ ┌────────────────────┐
│ BASELINE MODEL │ │ TRANSFORMER MODEL │ │ RULE ENGINE │
│ TF-IDF + LR │ │ DistilBERT Hybrid │ │ (optional) │
└───────────────────┘ └────────────────────┘ └────────────────────┘
│ │ │
└──────────────┬──────┴──────────────┬───────┘
▼ ▼
┌────────────────────────────────────┐
│ TAGIT HYBRID ENGINE │
│ Baseline if conf ≥ 0.70 │
│ Else Transformer │
└────────────────────────────────────┘
▼
┌────────────────────────────┐
│ FINAL CATEGORY │
└────────────────────────────┘
python -m venv .venv
source .venv/bin/activate # macOS/Linux
.venv\Scripts\activate # Windowspip install -r requirements.txtpython generate_synthetic.pyTAGIT uses a hybrid Transformer architecture that merges DistilBERT embeddings with numeric features (amount, amount_bucket, weekday, month) for superior classification accuracy.
python preprocess.py data/transactions.csv data/preprocessed.csvThis generates:
merchant_clean
merchant_token
amount
amount_bucket
weekday
month
label
Run:
python train_transformer.pyThis script will:
- Load preprocessed data
- Tokenize merchant text using DistilBERT
- Train hybrid encoder (Transformer + Tabular MLP)
- Save all required model files
| File | Purpose |
|---|---|
| models/transformer_best.pt | Best model weights |
| models/transformer_label_encoder.joblib | Encodes label strings |
| models/transformer_scaler.joblib | Scales numeric features |
| models/tokenizer/ | DistilBERT tokenizer |
| models/transformer_metadata.joblib | Model metadata |
python predict_transformer.pypython smart_predict.pyLogic:
if baseline_confidence >= 0.70:
use baseline
else:
use transformer
Results saved to:
data/predictions_hybrid.csv
python eval.pyOutputs macro/weighted F1 and per‑label metrics.
Install CUDA‑enabled torch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Verify:
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))python train_baseline.pyProduces:
models/baseline_pipe.joblib
models/label_encoder.joblib
python predict.pypython eval.pystreamlit run app2.pyVisit:
http://localhost:8501
- Real-time baseline inference
- High-accuracy Transformer model
- Smart hybrid confidence routing
- Beautiful Streamlit dashboard
- Clean architecture & modular design
- Easy to extend
- Professional metrics (macro/weighted F1)
Made with ❤️ for innovation.