An end-to-end NLP pipeline for classifying operational aircraft delay logs. This project uses fine-tuned transformer models to predict the delay cause, operational phase, and predictability, enabling structured understanding of free-text reports from airline operations.
Input:
Late pushback due to hydraulic pump issue discovered during final check.
Output:
- 🧠 Category:
TECHNICAL_FAILURE
- 🧭 Phase:
final check
- 🔮 Predictability:
UNPREDICTABLE
Airlines generate thousands of unstructured delay reports from pilots, crew, and ground staff. This tool transforms those into structured, actionable insights — useful for analytics, automation, and training.
Task | Description |
---|---|
Text Classification | Predicts 6 delay types (e.g., TECHNICAL_FAILURE, WEATHER) |
Phase Prediction | Predicts operational phase (e.g., boarding, pushback) |
Predictability Estimation | Heuristic logic: certain delays are predictable |
Interpretability | Attention-based token attribution |
Streamlit App | Interactive interface for real-time predictions |
A synthetic dataset of 300+ logs was created using domain-aware GPT prompting, with the following fields:
log_text
(description)label
(category)phase
(operational step)severity
(low, medium, high)
📁 File: data/synthetic_logs.csv
bert-base-uncased
fine-tuned using Hugging Face Transformers- Separate models for:
- Delay category:
delay_classifier/
- Operational phase:
delay_phase_classifier/
- Delay category:
Launch locally:
cd streamlit_app
streamlit run app.py
Task | Metric | Score |
---|---|---|
Delay Category Classification | Accuracy | 1.00 |
Precision (weighted) | 1.00 | |
Recall (weighted) | 1.00 | |
F1 Score (weighted) | 1.00 | |
Phase Prediction | Accuracy | 1.00 |
Precision (weighted) | 1.00 | |
Recall (weighted) | 1.00 | |
F1 Score (weighted) | 1.00 |
Altough the model achieves F1 = 1.00 on delay category and phase classification, this is expected due to the structured nature of operational delay logs. In real airline operations: Reports are written in concise, consistent templates Specific keywords (e.g., “crew rest”, “catering truck”) directly imply certain delay types GPT-generated synthetic data reflects this domain consistency Thus, the task is highly learnable — and the model's performance mirrors how easily human dispatchers or operations analysts could classify these entries
Flight held at gate due to ATC departure slot congestion.
→ Category: ATC_RESTRICTION | Phase: pre-departure | UNPREDICTABLE
Fueling truck arrived late, delaying pushback.
→ LOGISTICS_ISSUE | pre-departure | PREDICTABLE
Crew rest period exceeded; replacement crew dispatched.
→ CREW_DELAY | pre-departure | PREDICTABLE
aircraft-delay-nlp/
├── data/ # CSV + label maps
├── model/ # Saved models
├── notebooks/ # Training + inference
├── streamlit_app/ # Streamlit interface
├── requirements.txt
└── README.md