This repository gathers a comprehensive collection of datasets used in Computer Vision and Image Segmentation.
It covers various domains such as semantic segmentation, instance segmentation, medical imaging, urban scenes, and interactive segmentation.
The goal is to provide a consolidated reference containing essential information β number of images, mask availability, resolution, dataset type, number of classes, description, and download links β to help researchers and developers choose suitable datasets for Deep Learning, Active Learning, Object Detection, and Scene Understanding tasks.
The full list contains over 40 datasets. Click below to expand the table.
View the Full Dataset List (Click to expand)
| Dataset Name | # Images | Masks | Size | Resolution | Kind of Dataset | # Classes | Description | Year | Link | Public? |
|---|---|---|---|---|---|---|---|---|---|---|
| VOC 2012 | 17,000 | β Yes | 4 GB | 500Γ375 | Object Segmentation | 20 | Includes training/validation/test splits with per-pixel annotations and object labels. | 2012 | Kaggle | β |
| CityScapes | 25,000 | β Yes | 25 GB | 2048Γ1024 | Urban Segmentation | 30 | 50 different cities with pixel-level annotations for 30 classes. | 2016 | Official Site | β |
| COCO | 330,000 | β Yes | 50 GB | Variable | Object Segmentation | 80 | Complex scenes with multiple object masks. | 2014 | COCO | β |
| LVIS | 164,000 | β Yes | 25 GB | Variable | Instance Segmentation | 1,203 | Long-tail instance segmentation benchmark. | 2019 | LVIS | β |
| ADE20K | 27,000 | β Yes | 3 GB | Variable | Scene Parsing | 150 | Complete scene segmentation benchmark. | 2016 | MIT CSAIL | β |
| GTA V Synthetic | 25,000 | β Yes | 180 GB | 1914Γ1052 | Synthetic Semantic Segmentation | 19 | Synthetic urban scenes from GTA V with perfect pixel annotations. | 2016 | VISINF | β |
| BraTS | 3,000 (3D) | β Yes | 200 GB | 240Γ240Γ155 | 3D Medical Segmentation | 3 | Brain tumor dataset with edema, necrosis, and active tumor labels. | 2012 | CBICA | β |
| LiTS | 130 CT (3D) | β Yes | 80 GB | 512Γ512ΓZ | 3D Medical Segmentation | 2 | 3D liver and lesion segmentation dataset. | 2017 | CodaLab | β |
| Kvasir-SEG | 1,000 | β Yes | 2 GB | 576Γ720 | Medical Segmentation | 1 | Colorectal polyp dataset with binary masks. | 2020 | Simula | β |
| Nuclei | 30,000 patches | β Yes | 100 MB | 50Γ50 | Biomedical Segmentation | 1 | Cell nuclei dataset with binary masks. | 2018 | Kaggle | β |
| CVC-ClinicDB | 612 | β Yes | 50 MB | 384Γ288 | Medical Segmentation | 1 | Colonoscopy frames for polyp detection. | 2015 | Kaggle | β |
| REFUGE2 | 1,200 | β Yes | 3.8 GB | Variable | Medical Segmentation | 2 | Retinal disc and cup segmentation for glaucoma screening. | 2020 | Challenge | β |
| ISIC | 1,203,225 | β Yes | Variable | Variable | Medical (Dermatology) | 2β7 | Massive dataset for skin lesion segmentation. | 2016 | ISIC Archive | β |
| BrainMRI | 3,929 | β Yes | 350 MB | 256Γ256 | Medical Segmentation | 1 | Brain tumor segmentation dataset. | 2020 | Kaggle | β |
| LiverCT | 131 CT (3D) | β Yes | 80 GB | 512Γ512ΓZ | 3D Medical Segmentation | 2 | CT scans for liver injury segmentation. | 2017 | CodaLab | β |
| RESC | 110 scans | β Yes | 500 MB | Variable | Medical Segmentation | 3 | Retinal edema segmentation dataset. | 2018 | GitHub | β |
| TN3K | 3,500 | β Yes | 200 MB | 400Γ400 | Medical Segmentation | 1 | Thyroid nodule ultrasound segmentation dataset. | 2022 | Kaggle | β |
| DDTI | 5,000 | β Yes | 1.5 GB | Variable | Medical Segmentation | 1 | Panoramic dental x-rays for teeth segmentation. | 2022 | Kaggle | β |
| TG3K | 3,100 | β Yes | 250 MB | 400Γ400 | Medical Segmentation | 1 | Ultrasound thyroid gland segmentation dataset. | 2022 | OpenMedLab | β |
| BUSI | 780 | β Yes | 250 MB | 500Γ500 | Medical Segmentation | 3 | Breast ultrasound segmentation dataset. | 2019 | Dataset Page | β |
| CHAOS | 80 scans (3D) | β Yes | 20 GB | 512Γ512ΓZ | 3D Medical Segmentation | 4 | MRI and CT scans for liver, kidneys, and spleen segmentation. | 2019 | CHAOS | β |
| ROCO | 81,000 | β No | 8 GB | Variable | Medical Captioning | β | Radiology images paired with textual captions. | 2018 | GitHub | β |
| MedPix | 59,000 | β No | Variable | Variable | Medical Image Database | β | Clinical and diagnostic image archive. | 1999 | MedPix | β |
| NLPR | 1,000 pairs | β Yes | 998 MB | 640Γ480 | Salient Object Detection | 1 | Captured by Microsoft Kinect with indoor and outdoor scenes. | β | HyperAI | β |
| PaviaU | 1 image | β No | 100 MB | 610Γ340Γ103 | Spectral Classification | 9 | Hyperspectral image captured over Pavia, Italy. | β | Kaggle | β |
| BSDS500 | 500 | β Yes | 100 MB | Variable | Contour Detection | β | Human-annotated segmentation and contour detection benchmark. | β | Kaggle | β |
| NYUV2 | 1,449 | β Yes | 5.5 GB | 640Γ480 | Indoor Scene Segmentation | 40 | RGB-D dataset captured using Microsoft Kinect. | 2012 | NYU | β |
| SUNRGBD | 10,335 | β Yes | 60 GB | Variable | 2D/3D Segmentation | 37 | Densely annotated 3D indoor scenes. | 2015 | Princeton | β |
| CamVid | 701 frames | β Yes | 570 MB | 960Γ720 | Video Semantic Segmentation | 12 | First video dataset with pixel-level annotations for urban scenes. | 2008 | CamVid | β |
| 300W-LP | 122,450 | β No | 4 GB | Variable | Landmark Detection | 68 | Augmented version of 300W with rotated facial images. | 2016 | TensorFlow | β |
| Visual Genome | 108,000 | β No | 12 GB | Variable | Image Captioning | β | Object relationships and natural language annotations. | 2016 | VG | β |
| ISPRS Vaihingen | 33 | β Yes | 2 GB | ~2500Γ2000 | Aerial Image Segmentation | 6 | UHD aerial imagery with semantic labels. | 2012 | ISPRS | β |
| NJU2K | 1,985 | β Yes | 1.5 GB | Variable | Salient Object Detection | 1 | RGB image pairs for salient object detection. | 2014 | HyperAI | β |
| STERE | 1,000 | β Yes | 100 MB | 1024Γ768 | Object Detection | 1 | Stereo image pairs for object detection. | 2015 | KITTI | β |
| GrabCut | 50 | β Yes | 5 MB | Variable | Interactive Segmentation | 1 | Small dataset for interactive segmentation experiments. | 2004 | GitHub | β |
| Awesome Medical Datasets | - | β Yes | - | - | Medical Image Segmentation | - | A collection of multiple open medical datasets. | - | OpenMedLab | β |
| USPS | 9,298 | β No | 10 MB | 16Γ16 | Classification | 10 | Handwritten digit dataset from postal codes. | 1990 | LibSVM | β |
| MNIST | 70,000 | β No | 15 MB | 28Γ28 | Classification | 10 | Classic handwritten digit dataset. | 1998 | Kaggle | β |
| BioID | 1,521 | β No | 150 MB | 384Γ288 | Face Detection | 1 | Grayscale face localization dataset. | 1999 | BioID | β |
- β Public datasets are freely available for research and educational use.
- β Non-public datasets may require registration, challenge participation, or access requests.
- Some datasets (e.g., LiTS, BraTS) are 3D volumetric and require preprocessing pipelines before use.
You can:
- Explore datasets to benchmark segmentation models (e.g., U-Net, DeepLab, Mask R-CNN).
- Use them in Active Learning or Continual Learning pipelines.
- Combine multiple datasets to improve model generalization.
If you use this list or parts of it, please cite this repository:
@misc{segmentation_datasets_collection,
author = {Galetti, Daniel Martins},
title = {Image Segmentation and Computer Vision Datasets Collection},
year = {2025},
url = {https://github.com/Danielgaletti/Datasets},
note = {Comprehensive list of datasets for segmentation, detection, and scene understanding.}
}