High-dimensional data can lie on low-dimensional manifolds. This project creates and compares three dimensionality reduction algorithms, namely PCA, MDS, and isomap. These algorithms are tested with two experiments. The first experiment demonstrates how each embedder captures the manifold of a swiss roll. The second experiment applies isomap to a real-life scenario using a dataset of animals.
This experiment will use a swiss roll to investigate how each dimensionality reduction algorithm can capture manifolds. The swiss roll is shown in figure 1, and stored in "swiss_roll.npy."
(Figure 1. A swiss roll visualized in 3D. This data is used when comparing the dimensionality reduction algorithms.)Principal Component Analysis is a powerful dimensionality reduction algorithm. However, it does not capture manifolds. Figure 2a demonstrates the result when PCA is used with the swiss roll.
Multidimensional Scaling takes us one step closer to our goal. This model captures the pattern of a manifold, as seen in Figure 2b.
Isomap is the optimal solution to capture manifolds. Figure 2c demonstrates the result of using isomap with the swiss roll.
(Figure 2. Swiss roll used with (A) PCA, (B) MDS, and (C) Isomap.)Data of animals are retrieved from the UCI machine learning repository. 1 It consists of 101 instances and 17 attributes.
The 14:th attributes indicate the number of legs. It stores a set of integers, [0, 2, 4, 5, 6, 8]. This is different from the other attributes which store booleans. Attribute 14 is converted from numerical to boolean values by storing True for 2 and 4, and False for the other values. The intuition is that most land animals have either 2 or 4 legs, and it would be convenient to split on these values.
Isomap is used with the Zoo-data, and the result is found in figure 3.
(Figure 3. Isomap is used with the Zoo-dataset. Images of animals are added to make the plot more interpretable.)Land animals such as gorillas and lions were placed to the left in the 2D-plane. Animals that live in or close to the water were placed in the center. For example, we see frogs at origo. Moving upwards from the origo, we start to see penguins and flamingos. Moving downwards, we start to see fishes such as tuna and dolphins. Furthermore, insects were placed to the right in the 2D-plane.
Isomap is a powerful dimensionality reduction algorithm that is good at capturing manifolds, and the experiments confirm this.
This project requires packages: NumPy, SciPy, and Matplotlib.
To test the model, install the required packages, navigate to the repository in your terminal, and type:
python experiment.py