A tool for explorative image clustering, using algorithms for unsupervised machine learning provided by scikit-learn.
The result is visualised on a matplotlib frame as clusters of image thumbnails with coloured frames indicating the cluster membership. The plot is two-dimensional, but the viewing direction onto the higher-dimensional data can be selected via scrollbars.
The clusters can be exported, i.e. the original images are copied to different subfolders according to their cluster membership.
For better inspection, a cluster can be singled out and is re-plotted by clicking on a representative. Clicking on an image in this view opens the orignal image in a separate window. (The view of all clusters is recovered by re-applying the cluster analysis with the “apply” button.)
The GUI is invoked by running image_clustering_app.py.
- Scaler
- Select the scaler, which is applied to the data first.
- Decomposer
- Select the method for reducing the dimensionality of the data.
- components
- Enter the number of dimensions for the reduced data (irrelevant for decomposer “TSNE”).
- Clusterer
- Select the clustering method.
- n_clusters
- Enter the desired number of resulting clusters (irrelevant for clusterer “DBSCAN”).
- dbscan_min
- Enter the minimal cluster size for DBSCAN (irrelevant for other clusterers).
- dbscan_eps
- Enter the eps parameter (“cluster density”) for DBSCAN (irrelevant for other clusterers).
- select folder
- Select a folder with image files for analysis.
- apply
- Perform and display clustering analysis on the currently selected images according to the current inputs.
- export
- Copy the original images into a subfolder of their current folder, distributed over further subfolders according to their cluster membership.
- scrollbars
- Select the dimension of the reduced data to be displayed on the respective axis of the plot.
- slider “image size”
- Determine the size of the image thumbnails which are loaded and used as the data for clustering. For value n, the images are sized to (n, 2n/3) regardless of their original format.
- slider “image display scaling”
- Scale the size of the thumbnails shown in the plot.
- check box “on release”
- Scale the size of the thumbnails shown in the plot only on release of the slider (less performance challenging).
- button “axis off/on”
- Switch axis visibility. (Displaying the axis can be useful for estimating the eps parameter for DBSCAN.)

