Advanced Topics in Machine Learning -- Fall 2023, UniTS
During this first challenge of the Advanced Machine Learning Course, you will experiment with the development of a data analysis pipeline based upon various techniques seen during the lectures so far: some unsupervised (e.g. PCA, kernel-PCA) and some others supervised (e.g. kernel SVM and Artificial Neural Networks for classification).
The dataset of interest for the challenge will be FashionMNIST, an MNIST-like dataset of grayscale images of fashion items. Originally developed by Zalando Research in 2017 as a harder (yet drop-in compatible) replacement for the original MNIST dataset, it has been used in several papers and competitions since then.
Remember to consider the dataset mentioned below as the training set offered by FashionMNIST. Use the test set only at the end, to evaluate the overall accuracy of the pipeline!
Load the dataset in the most suitable form for the tasks that follow. Then, perform the following steps, with the goal of developing a geometric understanding of the dataset:
-
Perform a (linear) PCA on the dataset, and plot the first two (or three!) principal components along with the true label. Comment on data separation.
-
Perform a kernel-PCA on the dataset with a Gaussian kernel, and plot the first two (or three!) principal components along with the true label. Try to tune the dispersion parameter of the kernel to obtain a good separation of the data. Comment.
-
Perform another kernel-PCA on the dataset with another kernel of your own choice, and plot the first two (or three!) principal components along with the true label. Try to tune the degree of the polynomial kernel to obtain a good separation of the data. Comment.
Whenever suitable, try to complement your analysis with some graphs!
IMPORTANT NOTICE: As some of you have reported, performing kernel PCA on the entire FashionMNIST dataset can be memory- and time- demanding (as it scales with the square of the number of datapoints!). In case you want to reduce such requirements, you can either:
- Reduce the number of datapoints on which to perform kPCA, e.g. by slicing the randomly-shuffled dataset (most effective!);
- Reduce the size of the images in the dataset (i.e. by dropping even/odd rows/columns, or performing local pooling) (somehow effective: you may want to try it if you do not have access to powerful compute, but still enjoy challenges!).
Choose one of the results obtained in the previous section (you should choose the one better explaining data geometry), and ignore the true labels. Then, perform the following steps:
-
Considering only the first
$10$ components of the (kernel-)PCA and try to assign$10$ labels to the resulting datapoints. Choose the approach you deem most suitable. Comment on the results, by considering:a. How well does the label-assignment just performed reflect the true labels?
b. Does the number of components used (
$10$ ) reflect the actual knee- or gap- point of the spectrum associated to the principal components?
Whenever suitable, try to complement your analysis with some graphs!
Consider the dataset composed of the original images, with the label assigned in the previous section (regardless of its actual match with the true label!). Then, define and learn a classifier that can predict the label of a new image. Specifically:
-
Learn a kernel-SVM on the data/label pairs. The choice of the kernel and its hyperparameters is up to your experimentation and time availability. Comment on your choices and results.
-
Learn a fully-connected NN on the data/label pairs. The choice of the architecture and its hyperparameters is up to your experimentation and time availability: show at least two different hyperparameter configurations, and comment on the results.
-
Learn a CNN on the data/label pairs. The choice of the architecture and its hyperparameters is up to your experimentation and time availability. Comment on the results with special respect to the FCN architecture.
Evaluate the overall accuracy of the pipeline on the test set of FashionMNIST. I.e. compare the predicted labels from the three classifiers built in Section 3 with the true labels.
In order to assign a true label name (e.g. trousers, sandal, ...) to those determined just from (kernel-)PCA (that obviously carry no direct information about the subject of the picture), you can either:
i. Cheat and use the most abundant labels for each group of (kernel-)PCA-labelled datapoints.
ii. Sample a subset of datapoints from each (kernel-)PCA-labelled class, and assign one label by direct visual inspection. If you choose this route, it may also serve as a reminder of the fact that expert labelling is not always a trivial (and almost never a fast) task!
Comment on the results obtained.
Repeat the steps of Section 3 using the true labels of the dataset. Comment on the results, and draw a comparison between such results and those obtained from the previous hybrid pipeline.