This is an implementation of U-Net architecture for semantic segmentation of water bodies. Semantic segmentation is a computer vision task that involves classifying each pixel in an image as belonging to a particular class. For more information, refer to report.pdf.
The expected directory structure is displayed below. The images
directory contains all images from your dataset in jpg
format and the masks
directory should contain corresponding binary masks in png
format. An example dataset is provided in dataset_example.
.
└── dataset
├── images
│ ├── 007100.jpg
│ ├── 007110.jpg
│ ├── 007120.jpg
│ └── ...
├── masks
│ ├── 007100.png
│ ├── 007110.png
│ ├── 007120.png
│ └── ...
├── test.csv
├── train.csv
└── val.csv
Masks are expected to be as generated by the WaSR algorithm (example). If you intend to use different types of masks, modify the DatasetFolder.get_item
method in dataset loader to load them properly.
If you intend to use different image file formats, modify the DatasetFolder.make_dataset
method in dataset loader to properly generate filenames.
The train-val-test split is defined in the corresponding csv
files. An example of such file is displayed below. Entries refer to files in the images
directory, hence the jpg
format.
007550.jpg
007560.jpg
007570.jpg
007580.jpg
007590.jpg
007600.jpg
...
Clone the project, create a virtual environment and install required dependencies:
git clone https://github.com/kristjansoln/unet-segmentation.git
cd unet-segmentation
python3 -m venv venv
source venv/bin/activate
pip install requirements.txt
Install torch and torchvision as described in PyTorch documentation.
Run python3 train.py --help
for help on all available arguments. Few usage examples are shown below.
python3 train.py --train --test # Run both training and testing with default arguments
python3 train.py --train --batchsize 4 --imagesize 288 512 --epochs 150 # Run training only, with modified batch size, image size and number of epochs
python3 train.py --test --testcsv ./dataset2/test.csv --imagesize 288 512 # Run test only. Weights are loaded from ./outupt/weights.pth
Logs are stored in ./output/training.log
. When training, some graphs and model weights are saved to ./output
.
Training can be safely interrupted with Ctrl+C.
During testing, weights are loaded from the default location. Generated output masks are stored in ./output/generated_masks
.
Dataset augmentations can be modified. By default, images in the train dataset are randomly flipped horizontally, center rotated for up to 5 degrees, normalized and randomly changed in brightness and contrast. See Albumentations for more.
Initial learning rate is set to 10E-4. A ReduceLROnPlateau scheduler is implemented that reduces that learning rate to 10E-5 once the validation loss plateaus. When val. loss plateaus again, an early stop mechanism stops the training. The patience in both cases is set to 5 epochs.