Facenet--Triplet loss for image embedding using Cifar100 dataset
Image embedding is a computer vision tasks using convolutional neural networks to convert an image into an array of size (1, n) where n is the size of the embedding. This can be done by selecting 3 images, where the first is the anchor that will be the reference image, the second is a similar image with the anchor (same class), and the negative which has a different class from the anchor.
The loss function is defined as :
Where d(a, p)
and d(a, n)
represent the Euclidean distances between the Anchor and the Positive and Negative pairs. margin is a parameter helping the network learning a specific distance between positive and negative samples (using the anchor).
By using this formula, the network will learn to produce the smallest distance between Positive-Anchor and the largest distance between Anchor-Negative as illustrated in the following figure.
First, you have to download Cifar100 dataset for training and Cifar10 dataset for the validation from here. You can also using imagenet dataset for the better model performance.
Facenet requires a dataset directory as the following:
dataset
|-- airplane
| |-- airplane_0001.png
| |-- airplane_0002.png
| '-- airplane_0003.png
|-- cat
| |-- cat_0001.png
| |-- cat_0002.png
| '-- cat_0003.png
|-- frog
| |-- frog_0001.png
| |-- frog_0002.png
| '-- frog_0003.png
'-- truck
|-- truck_0001.png
|-- truck_0002.png
'-- truck_0003.png
Generate a pairs.txt file for the validation
python src/generate_pairs.py --data_dir cifar10
Before strat the training, you need to add the src for the PYTHONPATH by running the following command:
export PYTHONPATH=src
Start the training by running the following command:
python src/train_tripletloss.py --models_base_dir models --data_dir dataset/cifar100 --image_size 32 --model_def models.squeezenet --optimizer ADAGRAD --max_nrof_epochs 100 --lfw_pairs dataset/cifar10_pairs.txt --lfw_dir /content/facenet/dataset/cifar10
python src/freeze_graph.py checkpoint/ frozen_graph.pb
where the checkpoint/
is a directory containing the metagraph (.meta) file and the checkpoint (ckpt) file containing model parameters, and frozen_graph.pb
is the filename for the exported graphdef protobuf (.pb).
Since we have a model to do an image embedding, so we can use it to calculate the similarity of two images. Run the following command to compare two images
python src/compare.py --frozen_graph frozen_graph.pb --image1 image1.jpg --image2 image2.jpg
Download pretrained image embedding model here
The project demo can be found here
[1] F. Schroff and J. Philbin, “FaceNet: A Unified Embedding for Face Recognition and Clustering,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp. 815–823. arxiv