Dataset size and class distribution
Perform some sanity checks
- Checking for corrupted files
- File typing check
- Checking image channels
Visualizing the dataset
Visualizing the distribution of channel pixels by class
Identifying very dark and very light images and removing them
Identifying duplicates in our data
Transforming the images into a Feature Matrix
Using KMeans to cluster our images