Skip to content

Object localisation using deep reinforcement learning

Notifications You must be signed in to change notification settings



Repository files navigation


The aim of this project is to learn a policy to localize objects in images by turning visual attention to the salient parts of images. In order to achieve this goal, the popular RL algorithm, Q-learning, is adopted by incorporating the approximation method, CNNs. DQL is the method resulting from cooperating Q-learning and CNNs. While using this method for object localization is not new and was tried before in Active Object Localization with Deep Reinforcement Learning, in this project despite implementing the algorithm with the novel deep learning framework, Tensorflow, a new set of experiments were conducted by using a new neural network architecture to show that representation learning can happen by Q-learning. More specifically, the original paper uses a pre-trained CNN as a feature extractor. However, in this project, the model was trained without using a pre-trained network for feature extraction. This MSc project was conducted in Computer Vision & Autonomous Systems Group at the university of Glasgow under supervision of Dr Jan Paul Siebert. Below is the examples of a trained model on VOC 2012 dataset. The following sections describe user manual for researhers who intend to use this implementation. In addition, in order to make modifying the implementation modest all the files are commented.

Getting started

You can clone this project using this link and install requierments by pip install -r requirements.txt. The requirements.txt would install everything you need. However, before using the requirements.txt, it is suggested to create a virtual environment with python 2.7.12. This code was developed and tested on Ubuntu 16.04 using Python 2.7.12 and Tensorflow 1.8. The code works fine for Tensorflow 1.8 however, in order to run the code on the Glasgow university cluster it requires some changes. In the file the parts of the code that need to be modified in order to run on GPU cluster is marked as "Old API". In this way the code can run with older Tensorflow APIs. Otherwise, it is recommended to follow this tutorial to create a virtual environment and then install Tensorflow and all requirements.


In this project Pascal VOC 2012 dataset was used to train the model. It is organized to download and prepare the dataset for training in the first run. However, if you need to use another dataset then the input pipeline needs to be modified. To change the default dataset you need to make some changes in the files,, and In Pacal VOC dataset the gorund truth is provided seperately in xml files. For this resean, it is needed to write images and their corrensponding ground truth to a single file, .npz, in order to create image batches for efficient learning. That is done by Later .npz files are used by for training. Since Pascal VOC 2012 consists of 19386 images loading all images into memory makes trouble. For this, loads input images into memory in an efficient way. Further, reads .npz files and provides datapoints to

Command Line Options and Configuration

Having set up the environment, training can begin using Its command line options is as follow:

      usage: [-h] [-n NUM_EPISODES] [-rms REPLAY_MEMORY_SIZE]
                   [-rmis REPLAY_MEMORY_INIT_SIZE]
                   [-es EPSILON_START] [-ee EPSILON_END]
                   [-ed EPSILON_DECAY_STEPS] [-c CATEGORY [CATEGORY ...]]
                   [-m MODEL_NAME]

      Train an object localizer

      optional arguments:
      -h, --help            show this help message and exit
      -n NUM_EPISODES, --num_episodes NUM_EPISODES
                    Number of episodes that the agect can interact with an
                    image. Default: 5
      -rms REPLAY_MEMORY_SIZE, --replay_memory_size REPLAY_MEMORY_SIZE
                    Number of the most recent experiences that would be
                    stored. Default: 500000
      -rmis REPLAY_MEMORY_INIT_SIZE, --replay_memory_init_size REPLAY_MEMORY_INIT_SIZE
                    Number of experiences to initialize replay memory.
                    Default: 500
                    Number of steps after which estimator parameters are
                    copied to target network. Default: 10000
      -d DISCOUNT_FACTOR, --discount_factor DISCOUNT_FACTOR
                    Discount factor. Default: 0.99
      -es EPSILON_START, --epsilon_start EPSILON_START
                    Epsilon decay schedule start point. Default: 1.0
      -ee EPSILON_END, --epsilon_end EPSILON_END
                    Epsilon decay schedule end point. Default: 0.2
      -ed EPSILON_DECAY_STEPS, --epsilon_decay_steps EPSILON_DECAY_STEPS
                    Epsilon decay step rate. This number indicates epsilon
                    would be decearsed from start to end point after how
                    many steps. Default: 500
      -c CATEGORY [CATEGORY ...], --category CATEGORY [CATEGORY ...]
                    Indicating the categories are going to be used for
                    training. You can list name of the classes you want to
                    use in training. If you wish to use all classes then
                    you can use *. For instnce <-c cat dog>. Default: cat
      -m MODEL_NAME, --model_name MODEL_NAME
                    The trained model would be saved with this name under
                    the path ../experiments/model_name. Default:

Note: If you need to train a model on multiple categories the command would be python -c cat dog. In addition, if you want to trian a new mdoel on top of a previously trained model then you need to copy the content of the bestModel folder of the previously trained model to its checkpoints folder. In this way, the best model will be loaded for training.

To evaluate a trained model on the test set is used. Testing conditions can be set as below:

usage: [-h] [-n NUM_EPISODES] [-c CATEGORY [CATEGORY ...]]
		      [-m MODEL_NAME]

Evaluate a model on test set

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_EPISODES, --num_episodes NUM_EPISODES
		        Number of episodes that the agent can interact with an
		        image. Default: 15
  -c CATEGORY [CATEGORY ...], --category CATEGORY [CATEGORY ...]
		        Indicating the categories are going to be used for
		        testing. You can list name of the classes you want to
		        use in testing, for instnce <-c cat dog>. If you wish
		        to use all classes then you can use *. Default: cat
  -m MODEL_NAME, --model_name MODEL_NAME
		        The model name that will be loaded for evaluation. Do
		        not forget to put the model under the path
		        ../experiments/model_name. Default: default_model

There are two other python files that are useful for visualization purposes. can be used to visualize a sequence of actions:

usage: [-h] [-m MODEL_NAME] [-i IMAGE_PATH]
	                         [-g GROUND_TRUTH [GROUND_TRUTH ...]]
	                         [-n NAME]

Visualizing sequence of actions

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model_name MODEL_NAME
	                The model parameters that will be loaded for testing.
	                Do not forget to put the model under the path
	                ../experiments/model_name. Default: default_model
  -i IMAGE_PATH, --image_path IMAGE_PATH
	                Path to an image.
	                Target coordinates. The order of coordinates should be
	                like: xmin ymin xmax ymax. Default: 0 0 1 1
  -n NAME, --name NAME  Name of the output file. It will be stored in

In addition, the neural network layers can be visualized using

usage: [-h] [-m MODEL_NAME] [-i IMAGE_PATH]
	                        [-ln LAYER_NUM]

Visualizing CNN layers

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model_name MODEL_NAME
	                The model parameters that will be loaded for testing.
	                Do not forget to put the model under the path
	                ../experiments/model_name. Default: default_model
  -i IMAGE_PATH, --image_path IMAGE_PATH
	                Path to an image.
  -ln LAYER_NUM, --layer_num LAYER_NUM
	                Layer number you wish to visualize.

Note: In all visualization and evaluation files the best model saved in the directory of the given model is used.


The output of trainng process is stored in ../experiments/ModelName. The result will be saved in four folders. The first one is summaries_q_estimator consists of an TF event record. Using tensorboard, graphs related to the training can be visualised. To run tensorboard, it is needed to call tensorboard in this way tensorboard --logdir=../experiments/Modelname/summaries_q_estimator. The second folder is report. This folder includes log.txt file which is the log showed in terminal during training. The third one is 'checkpoints' folder contains three files, which corresponds to the final model saved at the end of training process. The final folder is 'bestModel' that includes the best model based on validation accuracy.

The output of evaluation process is stored in ../experiments/ModelName/report/evaluate_[categories].txt. That file consists of the results evaluated separately on each category and mean average precision (MAP) over all categories.

The output of this file is a short video shows the agent interactions with the given image. The result is stored in ../experiments/ModelName/anim.

The output of this script is a set of images each of which corresponds to a filter in a given layer. The result is stored in ../experiments/ModelName/visu.


This code is implemented by getting help from the following sources:


Object localisation using deep reinforcement learning







No releases published


No packages published


  • Python 56.7%
  • HTML 43.3%