The goal of this project is to develop a recommender system that will accept a few different user-supplied image of clothing as input, score them against the user's 'style vector' generated via user preferences during initialization of the app, and rank different outfits to help the user decide what to wear. All image files used to train the model for this project are from the DeepFashion dataset.
Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
This project requires the following tools:
- Python - The programming language used by Flask.
- PostgreSQL - A relational database system.
- Virtualenv - A tool for creating isolated Python environments.
To get started, install Python and Postgres on your local computer if you don't have them already. A simple way for Mac OS X users to install Postgres is using Postgres.app. You can optionally use another database system instead of Postgres, like SQLite.
The notebook in this repository is intended to be executed using Amazon's SageMaker platform and the following is a brief set of instructions on setting up a managed notebook instance using SageMaker.
Log in to the Azure console and go to the Azure dashboard. Click on 'Machine Learning'. It is recommended to enable GPUs for this particular project.
Once the instance has been started and is accessible, click on 'Open Jupyter' to get to the Jupyter notebook main page. To start, clone this repository into the notebook instance.
Click on the 'new' dropdown menu and select 'terminal'. By default, the working directory of the terminal instance is the home directory. Enter the appropriate directory and clone the repository as follows.
cd SageMaker
git clone https://github.com/Supearnesh/ml-waywt-rec.git
exit
This was the general outline followed for this Azure project:
- Importing the datasets
- Pre-processing data
- Training the CNN (using transfer learning)
- Creation of user style vectors
- Recommendation testing
The DeepFashion dataset used in this project is open-source and freely available:
- Download the DeepFashion dataset. Unzip the folder and place it in this project's home directory, at the location
/img
.
In the code cell below, we will write the file paths for the DeepFashion dataset in the numpy array img_files
and check the size of the dataset.
import numpy as np
from glob import glob
# !unzip img
# load filenames for clothing images
img_files = np.array(glob("img/*/*"))
# print number of images in each dataset
print('There are %d total clothing images.' % len(img_files))
The data has already been randomly partitioned off into training, testing, and validation datasets so all we need to do is load it into a dataframe and validate that the data is split in correct proportions.
The images are then resized to 150 x 150 and centercropped to create an image tensor of size 150 x 150 x 3. They are initially 300 pixels in height and the aspect ratio is not altered. In the interest of time, this dataset will not be augmented by adding flipped/rotated images to the training set; although, that is an effective method to increase the size of the training set.
import pandas as pd
df_full = pd.open_csv("data_attributes.csv")
df_train = df_full.loc[df_full['evaluation_status'] == 'train']][['img_path', 'category_values', 'attribute_values']]
df_test = df_full.loc[df_full['evaluation_status'] == 'test']][['img_path', 'category_values', 'attribute_values']]
df_val = df_full.loc[df_full['evaluation_status'] == 'val']][['img_path', 'category_values', 'attribute_values']]
print('The training set has %d records.' % len(df_train))
print('The testing set has %d records.' % len(df_test))
print('The validation set has %d records.' % len(df_val))
import os
from PIL import Image
from torchvision import datasets
from torchvision import transforms as T
from torch.utils.data import DataLoader
# Set PIL to be tolerant of image files that are truncated.
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
### DONE: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes
transform = T.Compose([T.Resize(150), T.CenterCrop(150), T.ToTensor()])
dataset_train = datasets.ImageFolder('img/train', transform=transform)
dataset_valid = datasets.ImageFolder('img/valid', transform=transform)
dataset_test = datasets.ImageFolder('img/test', transform=transform)
loader_train = DataLoader(dataset_train, batch_size=1, shuffle=False)
loader_valid = DataLoader(dataset_valid, batch_size=1, shuffle=False)
loader_test = DataLoader(dataset_test, batch_size=1, shuffle=False)
loaders_transfer = {'train': loader_train, 'valid': loader_valid, 'test': loader_test}
The FashionNet model is nearly identical to the VGG-16 model architecture, with the exception of the last convolutional layer. However, instead of introducing the additional complexities of the FashionNet model, this model can be simplified by simply retaining the attributes embedding from the dataset. The data will be filtered into 1,000, potentially relevant buckets across 5 attributes of clothing, namely its pattern, material, fit, cut, and style. All layers use Rectified Linear Units (ReLUs) for the reduction in training times as documented by Nair and Hinton. It will be interesting to test the trained model to see how the the training and validation loss function perform.
Vinod Nair and Geoffrey Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of ICML, 2010.
An alternative could have been to use a pretrained VGG-19 model, which would yield an architecture similar to that described by Simonyan and Zisserman. The results attained by their model showed great promise for a similar image classification problem and it could have made sense to reuse the same architecture, and only modifying the final fully connected layer as done for the VGG-16 model in the cells below.
Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Neural Network Based Image Classification Using Small Training Sample Size. In Proceedings of ICLR, 2015.
import torchvision.models as models
import torch.nn as nn
import torch
# The underlying network structure of FashionNet is identical to VGG-16
model_transfer = models.vgg19(pretrained=True)
for param in model_transfer.parameters():
param.requires_grad = False
# The sixth, final convolutional layer will be adjusted to 1,000
model_transfer.classifier[6] = nn.Linear(1000, 133)
# check if CUDA is available
use_cuda = torch.cuda.is_available()
# move to GPU
if use_cuda:
model_transfer = model_transfer.cuda()
# create a complete CNN
model_transfer = Net()
print(model_transfer)
# check if CUDA is available
use_cuda = torch.cuda.is_available()
# move tensors to GPU if CUDA is available
if use_cuda:
model_transfer.cuda()
Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer
, and the optimizer as optimizer_transfer
below.
import torch.optim as optim
## select loss function
criterion_transfer = nn.CrossEntropyLoss()
# check if CUDA is available
use_cuda = torch.cuda.is_available()
# move loss function to GPU if CUDA is available
if use_cuda:
criterion_transfer = criterion_transfer.cuda()
## select optimizer
optimizer_transfer = optim.SGD(model_transfer.parameters(), lr=0.001)
The model is to be trained and validated below, with the final model parameters to be saved at the filepath 'model_transfer.pt'
.
n_epochs = 25
# train the model
model_transfer = train(n_epochs, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')
# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
The model can be validated against test data to calculate and print the test loss and accuracy. We should ensure that the test accuracy is greater than 80%, as the implementation in the FashionNet paper yielded an accuracy of 85%.
def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
model.eval()
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
print('Test Loss: {:.6f}\n'.format(test_loss))
print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
100. * correct / total, correct, total))
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
This capability is the crux of a recommendation engine; it generates a feature vector for a particular user, based on images they have previously selected or liked, and subsequently compares future images to ascertain the similarity, or distance, from previous selections to recommend items that would be a good fit.
## load attribute labels and their mappings
df_attributes = pd.read_csv('labels_attributes.csv')
# list of attribute names and their corresponding indices
attr_pattern = []
attr_material = []
attr_fit = []
attr_cut = []
attr_style = []
for i in range(len(df_attributes)):
if df_attributes[['attribute_type_id']][i] == 1:
attr_pattern.append(df_attributes[['attribute_id']][i])
if df_attributes[['attribute_type_id']][i] == 2:
attr_material.append(df_attributes[['attribute_id']][i])
if df_attributes[['attribute_type_id']][i] == 3:
attr_fit.append(df_attributes[['attribute_id']][i])
if df_attributes[['attribute_type_id']][i] == 4:
attr_cut.append(df_attributes[['attribute_id']][i])
if df_attributes[['attribute_type_id']][i] == 5:
attr_style.append(df_attributes[['attribute_id']][i])
Test the recommender system on sample images. It would be good to understand the output and gauge its performance - regardless of which, it can tangibly be improved by:
- data augmentation of the training dataset by adding flipped/rotated images would yield a much larger training set and ultimately give better results
- further experimentation with CNN architectures could potentially lead to a more effective architecture with less overfitting
- an increase in training epochs, given more time, would both grant the training algorithms more time to converge at the local minimum and help discover patterns in training that could aid in identifying points of improvement
import urllib
import matplotlib.pyplot as plt
img = Image.open(urllib.request.urlopen('https://images.footballfanatics.com/FFImage/thumb.aspx?i=/productimages/_2510000/altimages/ff_2510691alt1_full.jpg'))
plt.imshow(img)
plt.show()
transform = T.Compose([T.Resize(150), T.CenterCrop(150), T.ToTensor()])
transformed_img = transform(img)
# the images have to be loaded in to a range of [0, 1]
# then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]
normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
normalized_img = normalize(transformed_img)
# model loading
tensor_img = normalized_img.unsqueeze(0)
# check if CUDA is available
use_cuda = torch.cuda.is_available()
# move image tensor to GPU if CUDA is available
if use_cuda:
tensor_img = tensor_img.cuda()
# make prediction by passing image tensor to model
prediction = model_transfer(tensor_img)
# convert predicted probabilities to class index
tensor_prediction = torch.argmax(prediction)
# move prediction tensor to CPU if CUDA is available
if use_cuda:
tensor_prediction = tensor_prediction.cpu()
predicted_class_index = int(np.squeeze(tensor_prediction.numpy()))
class_out = class_names[predicted_class_index] # predicted class index
# The output would then be compared against the user's style vector to rank against other potential outfits
Always remember to shut down the notebook if it is no longer being used. Azure charges for the duration that a notebook is left running, so if it is left on then there could be an unexpectedly large Azure bill (especially if using a GPU-enabled instance). If allocating considerable space for the notebook (15-25GB), there might be some monthly charges associated with storage as well.