This project is designed to improve YOLO's performance in segmenting surgical instruments in real-time surgical video. This is an implementation of the VQGAN-based version of BigDatasetGAN. I rearranged some of the code from Taiming Transformers and implemented the segmentation head for VQGAN from BigDatasetGAN based on the segmentation head from BigDatasetGAN.
I am currently working on improving the image and segmentation mask quality by enhancing the data quality and using transfer learning. I am training VQGAN on a subset of the SurgVu dataset (900k Images), fine-tuning on the SARAS-MEAD (23k Images) dataset, and then further fine-tuning on a smaller private dataset specific to Transorbital Robotic Surgery (2k Images). The idea is to train on a large dataset of porcine tissue (SurgVu), then fine-tune on a medium-sized dataset of human tissue (SARAS-MEAD), and then further specialize the model by training on a small dataset of domain-specific human tissue (TORS)
The VQDatasetGAN model generated these images at 256 x 256 resolution, then upsampled to 512 x 512



