Datasets similarity via Tensorization

In this work we implement algorithms based on the preprint "Determining whether two datasets cluster similarly without determining the clusters" by Van Eeghem et al. [1].

This work was an initial part of a research project by Maxence Giraud on "higher order clustering" supervized by Remy Boyer.

Usage

import dataset_similarity_tensor as dst

# Load 2 datasets V,W

## 1. Using kronecker product
VV = dst.tensorize_kr(V) 
WW = dst.tensorize_kr(W)

## 2. Using Third Order moment
VV = dst.tensorize_thirdordermoment(V).reshape(V.shape[1],-1) # We reshape because the principal angle are computed on an unfolded tensor (which becomes a matrix)
WW = dst.tensorize_thirdordermoment(W).reshape(W.shape[1],-1)

## Compute principal angle 
angle = dst.principal_angles_tensors(VV,WW)

The algorithms computing the principal angle thus resulting in an output between 0 and π/2, the closest this number is to 0 the more similar are the 2 datasets.

References

[1] Van Eeghem F., De Lathauwer L. (2020). Determining whether two datasets cluster similarly without determining the clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Datasets similarity via Tensorization

Usage

References

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Datasets similarity via Tensorization

Usage

References