This repository is a part of the subject: 2147334 MACHINE LEARNING OR DEEP LEARNING
Google Colab:
Link
Jupyter notebook:
docker-compose -f docker-compose.dev.yml up jupyter --build
Jupyter notebook with pytorch GPU support (Take longer to load):
docker-compose -f docker-compose.dev.yml up jupyter-pytorch --build
Objectives:
- Predict study programs by looking from course description
- (Optional) Use embedding from the best predictor to calculate course similarity and in the end create content-based recommender system.
Idea:
- Get courses data and study programfrom Academic Chula website
- There are about 500 study programs, 20,000 coursesm and 26 faculties in Chulalongkorn university.
- Use model that support course description embedding ex. TF-IDF, Neural network based model.
- Use SHAP to interpret models' result
Roadmap:
- Train model to predict study programs and compare between TF-IDF + SVM, LSTM + Embedding layer, and Transformer based model. Have very ppor result - Early stopped
- LSTM
- TF-IDF + SVM
- Thai2Fit + SVM
- Attention based model
- Add model notebook.
- Add scrapper notebook.
- Get study program data.
- Get courses description data.
- Preprocess courses description data.
- Train model to predict faculty
- LSTM
- TF-IDF + SVM
- Thai2Fit + SVM
- Pretrained Attention based model (WangChanBERTA)
- Create recommender system proof of concept from trained model