Skip to content

A Python implementation of the Directed Batch Growing Self-Organizing Map

License

Notifications You must be signed in to change notification settings

SandroMartens/DBGSOM

Repository files navigation

license CircleCI readthedocs AppVeyor Python package Upload Python Package

DBGSOM

DBGSOM is short for Directed Batch Growing Self-Organizing Map. A SOM is a type of artificial neural network that is used to to produce a low-dimensional representation of a higher dimensional data set while preserving the topological structure of the data. It can be used for supervised and unsupervised vector quantization, classification and many different data visualization tasks.

Features

  • Compatible with scikit-learn's API and can be used as a drop-in replacement for other clustering and classification algorithms
  • Can handle high-dimensional and non-uniform data distributions
  • Good results without parameter tuning
  • Better topology preservation and faster training time than classical SOMs
  • Interpretability of the results through plotting

How it works

The DBGSOM algorithm works by constructing a two-dimensional map of prototypes (neurons) where each neuron is connected to its neighbors. The first neurons on the map are initialized with random weights from the input data. The input data is then presented to the SOM. Each sample gets assigned to it's nearest neuron. The neuron weights are then updated to the samples that were mapped to each neuron. Neighboring neurons affect each others updates, so the low dimensional ordering of the map is preserved. The DBGSOM algorithm uses a growing mechanism to expand the map as needed. New neurons are added to the edge of the map where the quantization error of the boundary neurons is above a given growing threshold.

How to install

DBGSOM can be installed from PyPi via pip.

pip install DBGSOM

Usage

dbgsom implements the scikit-learn API. We have the SomClassifier and SomVQ for classification and clustering/vector quantization.

from dbgsom import SomVQ, SomClassifier
from sklearn.datasets import load_digits

digits_X, digits_y = load_digits(return_X_y=True)

quantizer = SomVQ()
classifier = SomClassifier()

quantizer.fit_predict(X=digits_X)
classifier.fit_predict(X=digits_X, y=digits_y)

Examples

Here are a few example use cases for DBGSOM.

Example Description
example With a two dimensional input we can clearly see how the protoypes (red) approximate the input distribution (white) while still preserving the square topology to their neighbors.
The fashion mnist dataset After training the SOM on the fashion mnist dataset we can plot the nearest neighbor of each prototype. We can see that the SOM ordered the prototypes in a way that neighboring prototypes are pairwise similar.
digits We can show the majority class each prototype represents. Samples from the same class are clustered together. The SOM was train on mnist digits.
darknet_pca We can use linear transformations like PCA to color code relative distances between prototypes in the input space. See darknet example notebook.

Dependencies

  • Python > 3.7
  • Numpy
  • NetworkX
  • tqdm
  • scikit-learn
  • seaborn
  • pandas

References

  • A directed batch growing approach to enhance the topology preservation of self-organizing map, Mahdi Vasighi and Homa Amini, 2017, http://dx.doi.org/10.1016/j.asoc.2017.02.015
  • Reference implementation by the authors in Matlab: https://github.com/mvasighi/DBGSOM
  • Statistics-enhanced Direct Batch Growth Self- organizing Mapping for efficient DoS Attack Detection, Xiaofei Qu et al., 2019, 10.1109/ACCESS.2019.2922737
  • Entropy-Defined Direct Batch Growing Hierarchical Self-Organizing Mapping for Efficient Network Anomaly Detection, Xiaofei Qu et al., 2021 10.1109/ACCESS.2021.3064200
  • Self-Organizing Maps, 3rd Edition, Teuvo Kohonen, 2003
  • MATLAB Implementations and Applications of the Self-Organizing Map, Teuvo Kohonen, 2014
  • Smoothed self-organizing map for robust clustering, P. D’Urso, L. De Giovanni and R. Massari, 2019, https://doi.org/10.1016/j.ins.2019.06.038

License

dbgsom is licensed under the MIT license.