Stochastic Neighbor Embedding Experiments in R
Note: This package is unlikely to see further major updates, but much of it lives on in smallvis.
An R package for experimenting with dimensionality reduction techniques, including the popular t-Distributed Stochastic Neighbor Embedding (t-SNE).
# install.packages("devtools")
devtools::install_github("jlmelville/sneer")
package?sneer
# sneer function knows how to do lots of embedding
?sneer
Also see the (currently under-construction) documentation web pages for a more detailed explanation.
# t-SNE on the iris dataset:
res <- sneer(iris)
# then do what you want with the embedded coordinates in res$coords
# sneer does t-SNE, looks for numeric columns and a factor column to color
# points with automatically, and does tSNE by default, but you can get specific:
res <- sneer(iris[, 1:4], labels = iris$Species, method = "tsne",
scale_type = "tsne", opt = "tsne", init = "r",
exaggerate = 4, exaggerate_off_iter = 100,
perplexity = 25)
There is a section of the documentation that has (many) more examples.
There are a lot of dimensionality reduction techniques out there, and many that take inspiration from t-SNE, but understanding what makes them work (or not) is complicated by the differences in dataset preparation, preprocessing, output initialization, optimization, and other heuristics.
Sneer is my attempt to write a package that not only provides a way to run multiple embedding algorithms with complete control over all the various twiddly bits, but also exposed lots of twiddly bits to twiddle on if that was what you wanted to do (and I do).
Its basic code was based heavily on Justin Donaldson's tsne R package, but is now mangled so far beyond its original form that I've made it a separate project rather than a fork. It does, however, inherit its license (GPL-2 or later).
Currently sneer offers:
- Embedding with t-SNE and its variants ASNE and SSNE.
- Sammon mapping and metric Multidimensional Scaling.
- Heavy-Tailed Symmetric SNE (HSSNE).
- Neighbor Retrieval Visualizer (NeRV).
- Jensen-Shannon Embedding (JSE).
- Multiscale SNE (msSNE).
- Weighted SNE using degree centrality (ws-SSNE).
- Inhomogeneous t-SNE.
- Custom embedding methods (see the
embedder
function man page). - A variety of optimizations using the mize package.
- The Spectral Directions optimization method of Vladymyrov and Carreira-Perpiñán, although in a non-sparse form.
- Output initialization options include using PCA scores matrix for easier reproducibility.
- Various simple preprocessing options.
- Numerical scores for qualitatively evaluating the embedding.
- s1k, a small (1000 points) 9-dimensional synthetic dataset that exemplifies the "crowding problem".
- It's in pure R, so it's slow.
- It doesn't implement any of the Barnes-Hut or multipole or related approaches to speed up the distance calculations from O(N2), so it's slow.
- It doesn't work with sparse matrices... so it's slow and it can't work with large datasets.
Consider this package designed for experimenting on smaller datasets, not production-readiness.
Also, fitting everything I wanted to do into one package has involved splitting everything up into lots of little functions, so good luck finding where anything actually gets done. Thus, its pedagogical value is negligible, unless you were looking for an insight into my questionable design, naming and decision making skills. But this is a hobby project, so I get to make it as over-engineered as I want.
I have some other packages that create or download datasets often used in SNE-related research:
- Simulation, MNIST Digit, Olivetti and Frey Faces
- COIL-20
- mize, the optimization package
I reverse engineered some specifics of the Spectral Directions gradient by translating the relevant part of the Matlab implementation provided on the Carreira-Perpiñán group's software page. Professor Carreira-Perpiñán kindly agreed to allow the resulting R code to be under the GPL license of this package. Obviously, assume any mistakes, errors or resulting destruction of your computer is a bug in sneer.
GPLv2 or later. The optimization part of sneer is provided by the mize package, which is available under the BSD 2-Clause license.