This repository mainly about deep learning parts which is consists of 4 parts(main, data, train, test).
The data pre-processing part is able to check in this Github Page.
- Environment Setting
Basically, deeplearning environment needs to consider lots of things. Like, verision of cuda, nvidia driver and the Deep learning framework. So, it is highly recommended to use docker. I also made my experiment environment by utilizing the docker. The fundamental environment for this experiment is like below.
- Ubuntu (Linux OS for using Nvidia docker)
- pytorch v1.11.0
- cuda 11.3
- cudnn 8
It's little bit tricky unless download these seperately.
But, you don't need to be worry about this,
Check the dockerfile
above and use it.
Dockerfile
You can also download the docker image through the
dockerhub.
The basic usage of this file is consists of 2 steps (build & run).
Each command are operated on the shell prompt.
- Build example
docker build . -t octa3d
- Run example
docker run -d -it \ -v /data:/root/Share/OCTA3d/data \ -v /home/Project/OCTA3d:/root/Share/OCTA3d \ --name "octa3d" \ --rm --gpus all octa3d:latest
The main function depicts overall process. Using Data_Handler in data.py, the input data for the learning has been set up. All the arguements from the argparser has been described in the main.py script.
The data.py is for handling dataset. From the pre-processing(split patch images for 2D, clipping for 3D normalizing) to customize Pytorch's Dataset. I was needed to do this task for each different dimension respectively. The concrete detail is described on the script through the comments.
Classification, Autoencoder pre-training (by customizing Clinicadl method)
Basically we utilize the pre-invented CNN models as they've been proved it's performence.
The point is, utilizing with our pre-processing method, we could get the increased inference scores.
The models that we have used for are depicted below table.
Dimension | VGGNet | ResNet | Inception V3 Net | Efficient Net | Vision Transformer |
---|---|---|---|---|---|
2D | 16, 19 | 50, 152 | O | O | O |
3D | 16 | 18, 50 | O | O | X |
There are several libraries to use these models and they actually automatically downloaded by provided Dockerfile. For the paper, we utilize the VGG19, ResNet-50,152, Inception V3 for 2D and ResNet 18, 50, Inception V3 for 3D. Because these models have been proved to be useful for the retina disease classifcation by previous researches. After taking binary-classification, it was able to verify that retaining volumetric information has a higher performance.
To leverage the transfer learning, adapt the autoencoder structure for pre-training and use the encoder parts for the classification with the fully connected layer. As pre-invented transfer learning method is actually using the model parameters which come from the natural image. To match the given medical data and overcome the aforementioned limitation, this architecture should be applied.
Currently, The multi-classification module has been tested and these will be combined with binary-classification with only the classification module. As their difference is just the way of scoring. Sooner these are integrated.
For the testset which had been splitted about 30% from the total data was used for the extracted best models.
To explain the classification process of the extracted model, we visualize them
by the Grad-CAM (by customizing M3d-cam)
As 3D volumetric data is used, the Grad-CAM has been customized to expand the dimension from 2D to 3D.
Overall process is like below.
After this process, improved retina lesion detection has been watched. For the case of utilizing the voluemtric information, the retinopathology has been detected quited accurately and the results are shown in below figure [D], [E]. As 3D attention map has been extracted, we can simulatenously observe the z-axis information of the lesion of retina. However, in case of 2D image, only x-y information is able to be acquired like figure [B], [C].
[A] is a X-Z image (=B-scan image) of the OCT volume data. [B] is projected X-Y image (=En-face image). [C] 2D Grad-CAM results from B. [D] is an extracted X-Y slice image from 3D OCTA volume (position : red dot). [E] is an 3D Grad-CAM results from D.