The required packages are included in requirements.txt
, you can build the environment for running the code by executing the following command in the project folder:
pip install -r requirements.txt
python train.py --dataset train
python inference.py --dataset train
Then you can find the answers in data/testlabel.txt
.
This project is based on the Adult Census Income dataset, which can be downloaded from kaggle.
For simplicity, we've placed the downloaded data in data/full
folder. Another version of dataset (the same data but different way of splitting data) provided by the project in CS311 is placed in data/train
.
- For preprocessing the dataset:
cd data
python preprocess.py --dataset [train | full] --sep
cd ..
- For training the model:
python train.py --dataset [train | full]
- For evaluating the model (note that only
full
dataset is available here as we don't have the answers to thetrain
dataset)
python evaluate.py --dataset full
- For making inference:
python inference.py --dataset [train | full]
Then, you can find the predicted labels in data/testlabel.txt
(or data/testlabel_full
if you use the full
dataset), each line in the text file represents an answer predicted according to the given information.
The official checkpoints (weights) can be found in the checkpoints
folder.