Skip to content

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

License

Notifications You must be signed in to change notification settings

ascourtas/visual-navigation-ai2thor

 
 

Repository files navigation

Target-driven Visual Navigation Model using Deep Reinforcement Learning

By: Aristana Scourtas and Zane Denmon

Problem Statement

Indoor visual navigation has many challenges. Where do you get the training data? How can you guarantee that your agent will generalize well? How can you ensure your agent will adapt well to dynamic environments? Our project is built off of the deep siamese actor-critic model introduced in the following paper:

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi
ICRA 2017, Singapore

The agent aims for environmental, goal/target, and real-world generalizability. The original agent utilized the first version of AI2Thor, and we transitioned the agent into the latest version of AI2Thor.


Image retrieved from Zhu et. al., 2017.

Taking as input both the observations and goals, the deep siamese actor-critic network captures an understanding of relative spatial positions as well as general scene layout. The asynchronous aspect of the model enables it to learn new scenes quickly; the more scenes the model trains on, the better it performs in unknown scenes since scene-specific layers of the model share the same model parameters (see paper).

Input and Output

The model takes as input an observation and goal image, both of size (224x224x3), and the output is one of the four following actions to be taken by the agent at any time step: "Move Ahead", "Rotate Right", "Rotate Left", "Move Back". We extract feature maps of the input images via the second to last layer of ResNet 50 (by utilizing Keras). These features are dimensionally reduced representations of the original images, and are used for the deep learning component of our model.


Image retrieved from Zhu et. al., 2017.

Setup

The code utilizes Tensorflow API r2.0, but is backwards compatible with all Tensorflow 1.0 methods from the original repository. The project was built and tested on macOS Mojave (10.14.6) using Python 3.7.6, but running the project via Docker should eliminate any OS-dependent issues. For the full list of dependencies, see the requirements.txt file at the root of this repo.

Downloading Pretrained Model Objects

This repo utilizes a pretrained A3C model, with all important model object information encoded in hdf5 dumps. To download the h5 dumps, first ensure that you have the wget utility installed -- here are installation instructions for Linux distros and macOS (Homebrew install recommended).

Then, run the following from the root of the repository:

bash ./data/download_scene_dumps.sh

The h5 files should be located in the /data folder.

If wget does not install properly, you can always download the files by clicking here, unzipping the download, and moving the files to the /data folder. You should then rerun the above bash script to rename the files appropriately.

Using Docker

  1. Ensure you have Docker installed. See the Docker installation instructions for more details; you would use the Desktop version for macOS or Windows, and the Server version for Linux distros. Check that Docker installed properly by running the following in your terminal:
docker --version
  1. From the root of the repository, run the following to build the Docker image (note this may take a while):
docker build -t ai2thor:1.0.0 .

Running the Model

We utilize a pre-trained model from the original repository, therefore training is unnecessary. The following instructions are for running the evaluation of said model.

Run the Docker container (note that it may take >45 min before evaluation stats are printed):

docker run ai2thor:1.0.0

The output of the stats on each episode will be displayed in the terminal.

Other Utilities

We've provided a modified keyboard_agent.py script that allows you to load an AI2Thor scene dump and use the arrow keys to navigate a scene. The script also lets you take screenshots to use as target images for your own evaluations, or capture depth images if you wish to experiment with those.

The commands are:

Use WASD keys to move the agent.
Use QE keys to move the camera.
Press I to switch between RGB and Depth views.
Press P to save an image of the current view.
Press R to reset agent's location.
Press F to quit.

We have not created a Docker environment to run this, but feel free to explore this utility by installing the dependencies on your local machine. To run the script, install Python 3.7.6, pip, and download the dependencies outlined in requirements.txt by running pip install -r requirements.txt. Then, install Unity via Unity Hub.

Start the utility via:

python keyboard_agent.py 

This should open an AI2-THOR window for you to navigate within.

Our Amendments

  • Upgraded the agent's original AI2Thor environment to the latest AI2Thor environment.
  • Implemented online feature extraction for agent evaluation.
  • Updated keyboard_agent.py to navigate the new environment and capture RGB and depth images.

Future Work

  • Cache ResNet50 features for quicker training and evaluation.
  • Replace ResNet for RedNet and utilize depth as input to the model.

License

MIT

About

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Dockerfile 2.5%
  • Shell 0.4%