Source code for Shopee Code League - Product Detection Challenge (Rank 4 in Private, Rank 6 in Public under a team named "Citebok") (Kaggle Link).
The task was to identify the class of the product based on provided train data with labels (supervised).
To solve the problem, we tried:
- multiple publicly available pretrain models in torchvision.models and Efficient Nets (Tan and Le. ICML 2019) (in model.py).
- several loss functions (in losses.py)
- several image augmentation techniques (in data.py, randaug.py)
- contrastive learning in self-supervised manner (Khosla et al. NeurIPS 2020, Chen et al. ICML 2020) (in train_con.py)
- transfer learning regularization and freezing (in train.py)
- fixing train-test resolution discrepancy (Touvron et al. NeurIPS 2019).
To run:
- Prepare dataset in label/image_file structure.
- Set correct paths in config.py.
- Use split.py to leave some train data for validation purpose.
- (optionally, run train_con.py) Run train.py for finetuning pretrained model in supervised manner. See arguments/options inside the file.
- Run test.py to obtain the test labels.
To run the code with the best performance setting, see run.sh.
We achieved the best performance with EffNet B5 with fixing resolution 456->600.
In the experiments, we found that:
- Using a good pretrained model pretrained on high-resolution images, e.g., EffNet B5-B7, could improve the performance significantly, compared to other modifications.
- Cross entropy achieved the best performance among loss functions. Some losses could drastically decrease the performance.
- Using augmentation technique had very little effect to the performance (less than 2%)
- Contrastive learning provided small increase in performance, but took much longer time for training.
- Transfer learning regularization and freezing yielded no improvement.
- Simply fixing train-test resolution discrepancy could impressively increase up to 5% performance.
Nuttapong Chairatanakul, Nontawat Charoenphakdee, Pannavat Terdchanakul, Zhenghang (Henry) Cui