Skip to content

A similarity-assisted variational autoencoder (saVAE) is a new method that adopts similarity information in the framework of the VAE.

Notifications You must be signed in to change notification settings

GwangWooKim/saVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Similarity-assisted Variational Autoencoder (saVAE)

A similarity-assisted variational autoencoder (saVAE) is a new method that adopts similarity information in the framework of the VAE. If you want to know summarized details, please refer to saVAE_overvew.pdf in this repository that contains backgrounds, preliminaries, methods, results, and conclusions. The published version is here.

Dependencies

  • python==3.9.13
  • torch==1.13.1
  • scanpy==1.9.3
  • loompy==3.0.7
  • scvi-tools==0.20.2
  • umap-learn==0.5.3

Other dependencies can be found at requirement.txt.

How to use

All arguments of the implementation are set to reproduce the results of the paper. It is enough to specify the data name. The available datasets are Two_moons, MNIST, cortex, pbmc, retina, and heart_cell_atlas.

Example

$ python main.py -d Two_moons
  • --data (or -d): The data name. Default = Two_moons.
  • --epochs (or -e): The number of iterations for training. Default = 200.
  • --batch_size (or -b): The number of mini-batch. Default = 128.
  • --weight (or -w): The weight between VAE and UMAP in saVAE. Default = 1e-3.
  • --updata_ratio (or -r): The number of UMAP iterations per VAE iteration. Default = 5.
  • --covariate: Whether to use covariate information or not.
  • --correction: Whether to do covariate correction or not (to improve the similarity table of the dataset).
  • --evaluation: Whether to evaluate or not (on the inffered latent space of saVAE).

The last three default values depend on the dataset.

Description of the outputs

After training saVAE on the specified dataset, you will obtain the resulting files (dir: /output/data_name/). You can check them via torch.load.

  • df.pt: The used training dataset.
  • df_.pt: When the dataset contains some covariate information (retina or heart_cell_atlas), df_.pt is composed of df.pt and its covariate. Otherwise, it is the same as df.pt.
  • labels.pt: The used original (string) labels.
  • labels_.pt: The transformed labels via sklearn.preprocessing.LabelEncoder.
  • saVAE_latent.pt: The encoded datapoints of df.pt from the ambient space to the learned latent one.
  • dict_.pt: The evaluation results by using saVAE_latent.pt and labels_.pt.
  • saVAE_rec.pt: In case of Two_moons, the reconstruected data is also saved.

Visualization

One example to visualize the resulting output is as follows:

import torch
import umap.umap_ as umap
import matplotlib.pyplot as plt 
from sklearn.decomposition import PCA

pca = PCA(n_components=2)

# MNIST example
saVAE_latent = torch.load('saVAE_latent.pt')
saVAE_latent_2d = umap.UMAP(random_state=42, 
                            init=pca.fit_transform(saVAE_latent), 
                            n_epochs=1000
                            ).fit_transform(saVAE_latent)
labels_ = torch.load('labels_.pt')

fig, ax = plt.subplots(1,1)
temp = ax.scatter(saVAE_latent_2d[:, 0], 
                  saVAE_latent_2d[:, 1], 
                  s = 1.5,  
                  cmap='Spectral', 
                  c = labels_
                  )
plt.tick_params(top=False,
               bottom=False,
               left=False,
               right=False,
               labelleft=False,
               labelbottom=False)
plt.xlabel('UMAP_1')
plt.ylabel('UMAP_2')
plt.tight_layout()

Note that we don't have to do dimension reduction in case of Two_moons, because its latent dimension is 2.

About

A similarity-assisted variational autoencoder (saVAE) is a new method that adopts similarity information in the framework of the VAE.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages