Skip to content

dimitri009/Clustering-with-DNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DCN: Deep Clustering Network

This repository contains a PyTorch implementation of the paper:

"Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering" Jianbo Yang, Devin M. Kaufman, Jinsung Yoon, and Mihaela van der Schaar, original paper

The code is originally from : @xuyxu and @guenthereder

I have merged the code in a jupyter notebook and added some minor changes.

Overview

Deep Clustering Network (DCN) is a method that jointly optimizes a deep autoencoder and K-means clustering objective. The goal is to learn a feature space where K-means performs well, combining representation learning and clustering in a unified framework.

DCN image credits.

This implementation includes:

  • A configurable deep autoencoder

  • Joint training with K-means loss

  • Evaluation metrics: NMI and ARI

  • Comparisons with vanilla K-means (on raw data and autoencoder features)

Install requirements

  • Scikit-learn:

pip install -U scikit-learn

  • Pytorch:

pip install torch torchvision (without CUDA) or

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 (with CUDA 12.6)

  • Pandas and Matplot:

pip install pandas , pip install matplotlib

Experiment

Dataset

The dataset used for the experiments is the mnist dataset:

MNIST

Pre-training

The reconstruction loss:

RECLOSS

Training

The ARI and NMI scores during the training:

ARI_NMI

Test

The ARI, NMI, ACC scores on the test set:

NMI ARI ACC
84.22 75.76 83.34

On the original paper:

NMI ARI ACC
81.- 75.- 83.-

The ARI, NMI, ACC scores of the vanilla Kmeans:

NMI ARI ACC
43.01 39.89 49.00

Visualisation of the feature latent space

TSNE

About

Implementation of the DCN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published