Author: Tianyi Miao
Modified Sep 12, 2019
SlideSeg3 is an enhanced python3 version of the original SlideSeg.
Author: Brendan Crabb brendancrabb8388@pointloma.edu
Created August 1, 2017
Welcome to SlideSeg, a python module modified from SlideSeg that allows you to segment whole slide images into usable image chips for deep learning. Image masks for each chip are generated from associated markup and annotation files.
If you use this code for research purposes, please cite the following in your paper:
Brendan Crabb, Niels Olson, "SlideSeg: a Python module for the creation of annotated image repositories from whole slide images", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 105811C (6 March 2018); doi: 10.1117/12.2300262; https://doi.org/10.1117/12.2300262
Go to main directory
conda env create -f environment_slideseg3.yml
Creating the environment might take a few minutes. Once finished, issue the following command to activate the environment:
- Windows:
activate SlideSeg3
- macOS and Linux:
source activate SlideSeg3
If the environment was activated successfully, you should see (SlideSeg) at the beggining of the command prompt.
OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.
Create a folder called 'images/' in the main directory and copy all of the slide images into this folder. Create a folder called 'xml/' in the main directory copy the markup and annotation files (in .xml format) into this folder. It is important that the annotation files have the same file name as the slide they are associated with.
Set parameters in Parameters.txt
slide_path: Path to the folder of slide images
xml_path: Path to the folder of xml files
output_dir: Path to the output folder where image_chips, image_masks, and text_files will be saved
format: Output format of the image_chips and image_masks (png or jpg only)
quality: Output quality: JPEG compression if output format is 'jpg' (100 recommended,jpg compression artifacts will distort image segmentation)
size: Size of image_chips and image_masks in pixels
overlap: Pixel overlap between image chips
key: The text file containing annotation keys and color codes
save_all: True saves every image_chip, False only saves chips containing an annotated pixel
save_ratio: Ratio of image_chips containing annotations to image_chips not containing annotations (use 'inf' if only annotated chips are desired; only applicable if save_all == False
level: Choose from highest (highest magnification), all, lowest (lowest magnification), 40.0, 20.0, 10.0, 5.0, 2.5, 1.25
if no specific magnification created by manufactory will use lower magnification. e.g 40x->20x
cpus: Number of CPUs to be used to parallel multiple WSIs, if processing all levels, less then 4 cpus will be recommanded in case of memory lack.
The main directory should already contain an Annotation_Key.txt file. If no Annotation_Key file is present, one will be generated automatically from the annotation files in the xml folder.
The Annotation_Key file contains every annotation key with its associated color code. In all image masks, annotations with that key will have the specified pixel value. If an unknown key is encountered, it will be given a pixel value and added to the Annotation_Key automatically.
Once in SlideSeg3 environment, run the python script 'main.py'. Jupter notebook will be supported later.
https://github.com/btcrabb/SlideSeg
Go to SlideSeg3 directory
Assume your data located at "/data/$USER/images" and "/data/$USER/xml"
Output will be created at "/data/$USER/SlideSeg3-Job id" after job done
sbatch \
--gres=lscratch:200 \
--cpus-per-task=8 \
--mem=200g \
--time=1440 \
process.sh \
/data/$USER/images/ \
/data/$USER/xml/
New setuptools version 46 will cause error because it is not compatible with openslide. Setuptools 45 will be used until openslide update.