This project implements a hybrid log classification system using three complementary approaches to handle varying complexity in log patterns. It integrates Regular Expressions (Regex) β , Sentence Transformer + Logistic Regression π€, and Large Language Models (LLMs) π to ensure flexibility and accuracy in classifying log messages.
- β‘ FastAPI Interface: Provides an API endpoint for classifying log messages from CSV files.
- π Three-Tier Classification:
- Regex-based classification β for structured patterns.
- BERT + Logistic Regression π€ for complex, labeled data.
- LLM fallback π for handling unknown or insufficiently labeled patterns.
- π Efficient Model Handling: Uses a pre-trained model (
log_classifier.joblib) for inference.
- π₯ Log Message Input
- π Regex Classification
- If a valid class is found, return it.
- If the pattern is unknown, proceed to step 3.
- π§ BERT-based Classification (if enough training samples exist)
- If confident, return the predicted class.
- If uncertain, proceed to step 4.
- π€― LLM-based Classification π
- Uses a large language model to predict the class for unknown patterns.
βββ models
β βββ log_classifier.joblib
βββ testing
β βββ test.csv
β βββ output.csv
βββ training
β βββ dataset
β β βββ data.csv
β βββ train.ipynb
βββ bert_helper.py
βββ classify.py
βββ llm_helper.py
βββ main.py
βββ regex_helper.py
- π€ Method:
POST - π₯ Request: Upload a CSV file with
sourceandlog_messagecolumns. - π Response: A classified CSV file with an additional
target_labelcolumn.
import requests
url = "http://localhost:8000/classify/"
files = {"file": open("test.csv", "rb")}
response = requests.post(url, files=files)
if response.status_code == 200:
with open("classified_output.csv", "wb") as f:
f.write(response.content)
print("β
Classified file saved as classified_output.csv")
else:
print("β Error:", response.json())git clone https://github.com/ArchitJ6/Log-Classification-System.git
cd Log-Classification-Systempip install -r requirements.txtfastapi run main.pyTo train the classification model, run the Jupyter notebook:
jupyter notebook training/train.ipynbThe model will be saved as models/log_classifier.joblib.
Contributions are welcome! Fork the project and submit your pull requests.
This project is licensed under the MIT License.
