This repository contains a PyTorch implementation of the paper:
"Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering" Jianbo Yang, Devin M. Kaufman, Jinsung Yoon, and Mihaela van der Schaar, original paper
The code is originally from : @xuyxu and @guenthereder
I have merged the code in a jupyter notebook and added some minor changes.
Deep Clustering Network (DCN) is a method that jointly optimizes a deep autoencoder and K-means clustering objective. The goal is to learn a feature space where K-means performs well, combining representation learning and clustering in a unified framework.
This implementation includes:
-
A configurable deep autoencoder
-
Joint training with K-means loss
-
Evaluation metrics: NMI and ARI
-
Comparisons with vanilla K-means (on raw data and autoencoder features)
- Scikit-learn:
pip install -U scikit-learn
- Pytorch:
pip install torch torchvision
(without CUDA) or
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
(with CUDA 12.6)
- Pandas and Matplot:
pip install pandas
, pip install matplotlib
The dataset used for the experiments is the mnist dataset:
The reconstruction loss:
The ARI and NMI scores during the training:
The ARI, NMI, ACC scores on the test set:
NMI | ARI | ACC |
---|---|---|
84.22 | 75.76 | 83.34 |
On the original paper:
NMI | ARI | ACC |
---|---|---|
81.- | 75.- | 83.- |
The ARI, NMI, ACC scores of the vanilla Kmeans:
NMI | ARI | ACC |
---|---|---|
43.01 | 39.89 | 49.00 |