Skip to content

Co-clustering algorithms can seek homogeneous sub-matrices into a dyadic data matrix, such as a document-word matrix.

License

Notifications You must be signed in to change notification settings

Saeidhoseinipour/NMTFcoclust

Repository files navigation

https://github.com/Saeidhoseinipour/NMTFcoclust https://github.com/Saeidhoseinipour/NMTFcoclust Supplementary material https://github.com/Saeidhoseinipour/NMTFcoclust https://github.com/Saeidhoseinipour/NMTFcoclust https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py https://github.com/Saeidhoseinipour/NMTFcoclust/blob/master/Models/NMTFcoclust_OPNMTF_alpha.py

Table of Contents

Saeid Hoseinipour Saeid Hoseinipour

NMTFcoclust implements decomposition on a data matrix 𝐗 (document-word counts, movie-viewer ratings, and product-customer purchases matrices) with finding three matrices:

  • 𝐅 (roles membership rows)
  • 𝐆 (roles membership columns)
  • 𝐒 (roles summary matrix)

The low-rank approximation of 𝐗 by

$$\mathbf{X} \approx \mathbf{FSG}^{\top}$$

non-negative matrix tri-factorization,OPNMTF,NMTF, Saeid Hoseinipour, text mining, Co-clustering, wordcloud, NMTFcoclust

Brief description of models

NMTFcoclust implements the proposed algorithm (OPNMTF) and some NMTF according to the objective functions below:

$$D_{\alpha}(\mathbf{X}||\mathbf{FSG}^{\top})+ \lambda \; D_{\alpha}(\mathbf{I}_{g}||\mathbf{F}^{\top}\mathbf{F})+ \mu \; D_{\alpha}(\mathbf{I}_{s}||\mathbf{G}^{\top}\mathbf{G})$$ $$0.5||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+0.5 \tau \; Tr(\mathbf{F} \Psi_{g}\mathbf{F}^{\top})+0.5 \eta \; Tr(\mathbf{G} \Psi_{s}\mathbf{G}^{\top})+ 0.5 \gamma \; Tr(\mathbf{S}^{\top}\mathbf{S})$$ $$0.5 ||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}$$ $$||\mathbf{X}-\mathbf{FSG}^{\top}||^{2}$$ $$||\mathbf{X}-\mathbf{F}\mathbf{S}\mathbf{G}^{\top}||^{2}+ Tr(\Lambda (\mathbf{F}^{\top}\mathbf{F}-\mathbf{I}_{s}))+ Tr(\Gamma (\mathbf{G}^{\top}\mathbf{G}-\mathbf{I}_{g}))$$ $$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}+ Tr(\Lambda \mathbf{F}^{\top})+ Tr( \Gamma \mathbf{G}^{\top})$$ $$||\mathbf{X}-\mathbf{FF^{\top}XGG}^{\top}||^{2}$$

Requirements

numpy==1.18.3
pandas==1.0.3
scipy==1.4.1
matplotlib==3.0.3
scikit-learn==0.22.2.post1
coclust==0.2.1
Datasets #Documents #Words Sporsity(%0) Number of clusters
CSTR 475 1000 96% 4
WebACE 2340 1000 91.83% 20
Classic3 3891 4303 98% 3
Sports 8580 14870 99.99% 7
Reviews 4069 18483 99.99% 5
RCV1_4Class 9625 29992 99.75% 4
NG20 19949 43586 99.99% 20
20Newsgroups 18846 26214 96.96% 20
TDT2 9394 36771 99.64% 30
RCV1_ori 9625 29992 96.62% 4
import pandas as pd 
import numpy as np
from scipy.io import loadmat
from sklearn.metrics import confusion_matrix 



                                                                  

file_name=r"NMTFcoclust\Dataset\Classic3\classic3.mat"
mydata = loadmat(file_name)

                                                                    
X_Classic3 = mydata['A'].toarray()
X_Classic3_sum_1 = X_Classic3/X_Classic3.sum()
                                                                   
true_labels = mydata['labels'].flatten().tolist()                  
true_labels = [x+1 for x in true_labels]                           
print(confusion_matrix(true_labels, true_labels))



 Medical:               [[1033    0     0]
 Information Retrieval: [   0  1460     0]
 Aeronautical Systems:  [   0    0   1398]]

Model

from NMTFcoclust.Models.NMTFcoclust_OPNMTF_alpha_2 import OPNMTF
from NMTFcoclust.Evaluation.EV import Process_EV

OPNMTF_alpha = OPNMTF(n_row_clusters = 3, n_col_clusters = 3, landa = 0.3,  mu = 0.3,  alpha = 0.4)
OPNMTF_alpha.fit(X_Classic3_sum_1)
Process_Ev = Process_EV( true_labels ,X_Classic3_sum_1, OPNMTF_alpha) 



Accuracy (Acc):0.9100488306347982
Normalized Mutual Info (NMI):0.7703948803438703
Adjusted Rand Index (ARI):0.7641161476685447

Confusion Matrix (CM):
				[[1033    0    0]
				 [ 276 1184    0]
				 [   0   74 1324]]
Total Time:  26.558243700000276
non-negative matrix tri-factorization,OPNMTF, Orthogonal Parametric, Text mining, Matrix factorization, Co-clustering, Saeid Hoseinipour, divergence, wordcloud

Supplementary material

OPNMTF implements on synthetic datasets such as Bernoulli, Poisson, and Truncated Gaussian:

non-negative matrix tri-factorization,OPNMTF,NMTF, Saeid Hoseinipour, text mining, Co-clustering, wordcloud, NMTFcoclust

Contributions

  • We proposed a co-clustering algorithm Orthogonal Parametric Non-negative Matrix Tri-Factorization (OPNMTF) by Adding two penalty terms for controlling the orthogonality of row and column clusters based on 𝛼-divergence.
  • We use the 𝛼-divergence as a measure of divergence between the observation matrix and the approximation matrix. This unification permits more flexibility in determining divergence measures by changing the value of 𝛼.
  • Experiments on six real text datasets demonstrate the effectiveness of the proposed model compared to the state-of-the-art co-clustering methods.

Highlights

  • Our algorithm works by multiplicative update rules and it is convergence.
  • Adding two penalties for controlling the orthogonality of row and column clusters.
  • Unifying a class of algorithms for co-clustering based on $\alpha$-divergence.
  • All datasets and algorithm codes are available on GitHub as NMTFcoclust repository.

Cite

Please cite the following paper in your publication if you are using NMTFcoclust in your research:

 @article{Saeid_OPNMTF_2023, 
    title=            {Orthogonal Parametric Non-negative Matrix Tri-Factorization with 𝛼-Divergence for Co-clustering}, 
    DOI=              {10.1016/j.eswa.2023.120680},
    volume=           {231}, 
    number=           {120680},
    journal=          {Expert Systems with Applications}, 
    authors=          {Saeid Hoseinipour, Mina Aminghafari, Adel Mohammadpour}, 
    year=             {2023}
} 

References

[1] Wang et al, Penalized nonnegative matrix tri-factorization for co-clustering (2017), Expert Systems with Applications.

[2] Yoo et al, Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel manifolds (2010), Information Processing and Management.

[3] Ding et al, Orthogonal nonnegative matrix tri-factorizations for clustering (2008), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

[4] Long et al, Co-clustering by block value decomposition (2005), Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.

[5] Labiod et al, Co-clustering under nonnegative matrix tri-factorization (2011), International Conference on Neural Information Processing.

[6] Li et al, Nonnegative Matrix Factorization on Orthogonal Subspace (2010), Pattern Recognition Letters.

[7] Li et al, Nonnegative Matrix Factorizations for Clustering: A Survey (2019), Data Clustering.

[8] Cichocki et al, Non-negative matrix factorization with $\alpha$-divergence (2008), Pattern Recognition Letters.

[9] Saeid, Hoseinipour et al, Orthogonal parametric non-negative matrix tri-factorization with 𝛼-Divergence for co-clustering, Expert Systems with Applications (2023).