-
Dataset size and class distribution
-
Perform some sanity checks
- Checking for corrupted files
- File typing check
- Checking image channels
-
Visualizing the dataset
-
Visualizing the distribution of channel pixels by class
-
Identifying very dark and very light images and removing them
-
Identifying duplicates in our data
-
Transforming the images into a Feature Matrix
-
Using KMeans to cluster our images