Skip to content

A hybrid log classification system integrating Regex βœ…, BERT + Logistic Regression πŸ€–, and LLMs πŸ“š to efficiently classify log messages via a FastAPI interface.

License

Notifications You must be signed in to change notification settings

ArchitJ6/Log-Classification-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Log Classification System

This project implements a hybrid log classification system using three complementary approaches to handle varying complexity in log patterns. It integrates Regular Expressions (Regex) βœ…, Sentence Transformer + Logistic Regression πŸ€–, and Large Language Models (LLMs) πŸ“š to ensure flexibility and accuracy in classifying log messages.

✨ Features

  • ⚑ FastAPI Interface: Provides an API endpoint for classifying log messages from CSV files.
  • πŸ” Three-Tier Classification:
    • Regex-based classification βœ… for structured patterns.
    • BERT + Logistic Regression πŸ€– for complex, labeled data.
    • LLM fallback πŸ“š for handling unknown or insufficiently labeled patterns.
  • πŸ“‚ Efficient Model Handling: Uses a pre-trained model (log_classifier.joblib) for inference.

πŸ”„ Classification Flow

  1. πŸ“₯ Log Message Input
  2. πŸ“ Regex Classification
    • If a valid class is found, return it.
    • If the pattern is unknown, proceed to step 3.
  3. 🧠 BERT-based Classification (if enough training samples exist)
    • If confident, return the predicted class.
    • If uncertain, proceed to step 4.
  4. 🀯 LLM-based Classification πŸ“š
    • Uses a large language model to predict the class for unknown patterns.

🎯 Decision Flow

decision_flow

πŸ“‚ File Structure

β”œβ”€β”€ models  
β”‚   β”œβ”€β”€ log_classifier.joblib  
β”œβ”€β”€ testing  
β”‚   β”œβ”€β”€ test.csv  
β”‚   β”œβ”€β”€ output.csv  
β”œβ”€β”€ training  
β”‚   β”œβ”€β”€ dataset  
β”‚   β”‚   β”œβ”€β”€ data.csv  
β”‚   β”œβ”€β”€ train.ipynb  
β”œβ”€β”€ bert_helper.py  
β”œβ”€β”€ classify.py  
β”œβ”€β”€ llm_helper.py  
β”œβ”€β”€ main.py  
β”œβ”€β”€ regex_helper.py  

🌐 API Usage

πŸ“Œ Endpoint: /classify/

  • πŸ“€ Method: POST
  • πŸ“₯ Request: Upload a CSV file with source and log_message columns.
  • πŸ“„ Response: A classified CSV file with an additional target_label column.

πŸ“Œ Example Request (Python)

import requests

url = "http://localhost:8000/classify/"
files = {"file": open("test.csv", "rb")}

response = requests.post(url, files=files)
if response.status_code == 200:
    with open("classified_output.csv", "wb") as f:
        f.write(response.content)
    print("βœ… Classified file saved as classified_output.csv")
else:
    print("❌ Error:", response.json())

βš™οΈ Setup & Installation

1️⃣ Clone the Repository

git clone https://github.com/ArchitJ6/Log-Classification-System.git
cd Log-Classification-System

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Run FastAPI Server

fastapi run main.py

πŸ‹οΈ Model Training

To train the classification model, run the Jupyter notebook:

jupyter notebook training/train.ipynb

The model will be saved as models/log_classifier.joblib.

πŸ§‘β€πŸ’» Contributing

Contributions are welcome! Fork the project and submit your pull requests.

πŸ“œ License

This project is licensed under the MIT License.

About

A hybrid log classification system integrating Regex βœ…, BERT + Logistic Regression πŸ€–, and LLMs πŸ“š to efficiently classify log messages via a FastAPI interface.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published