Name	Name	Last commit message	Last commit date
Latest commit History 147 Commits
configs	configs
files	files
imgs	imgs
notebooks	notebooks
scripts	scripts
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md

Learning to reinforcement learn for Neural Architecture Search

MSc thesis project developed by Jorge Gomez Robles at the Eindhoven University of Technology (TU/e), under the supervision of dr. ir. Joaquin Vanschoren.

You can check the paper here.

Points of contact

Jorge Gomez Robles (j.gomezrb.dev@gmail.com)
Joaquin Vanschoren (j.vanschoren@gmail.com)

Overview of the research project

The ultimate goal of Neural Architecture Search (NAS) is to come up with an algorithm that can design well-performing architectures for any dataset of interest. A promising approach to NAS is reinforcement learning (RL).

One of the main limitations of RL on the NAS problem is the need to run the procedure from scratch for every dataset of interest. So far, most of the relevant results show how to apply standard RL algorithms on NAS for CIFAR, but little attention is paid to other datasets. Moreover, RL tends to be an expensive procedure for NAS, making it unfeasible to replay it on new datasets.

An alternative is to explore meta-RL, which can learn a policy that can be transferred to previously unseen environments (i.e., datasets). In this work, we explore, for the first time, meta-RL for NAS. We study whether or not the transfer provides an advantage during training and evaluation (i.e., when the policy is fixed).

The meta-RL algorithm that we use is inspired by the work of Wang et al. and Duan et al.. Our NAS search space and performance estimation strategy are based on the BlockQNN methodology. The environments are associated to 5 datasets from the meta-dataset: omniglot, vgg_flower, and dtd for training; aircraft and cu_birds for evaluation.

Results

Experiment 1: training the meta-RL agent to design chain-structured networks

Dataset	Best reward	Episode length	Acc. reward
`omniglot`
`vgg_net`
`dtd`

Policy entropy	Distribution of actions

Experiment 2: evaluating the policy on previously unseen environments

a) Evaluatiing the policy

Dataset	Best reward	Episode length	Acc. reward
`aircraft`
`cu_birds`

Distribution of actions

b) Evaluating the designed networks

Dataset	Deep meta-RL (1st)	Deep meta-RL (2nd)	Shortened VGG19
`aircraft`	49.18 ± 1.2	50.11 ± 1.02	30.85 ± 10.82
`cu_birds`	23.97 ± 1.28	24.24 ± 0.90	6.66 ± 1.98

Experiment 3: training the meta-RL agent to design multi-branch structures

Dataset	Best reward	Episode length	Acc. reward
`omniglot`

Distribution of actions

How to run

This repository (nas-dmrl) contains scripts to run the experiments. However, all the NAS and RL logic is exposed on independent repositories. We summarize all the requirements and main assumptions next.

Setup

Follow the next steps to avoid unnecessary changes in the scripts:

Create the workspace: mkdir ${HOME}/workspace and set the environment variable WORKSPACE=${HOME}/workspace
Install miniconda into ${WORKSPACE}, so that the miniconda path is ${WORKSPACE}/miniconda3
Create the virtual environment nasdmrl with Python 3.6.8: conda create -n nasdmrl python=3.6.8
Run scripts/setup/install_pypkgs.sh or install all packages listed there.
Install the nasgym
Make sure that all meta-dataset files (TFRecords) are in ${WORKSPACE}/metadataset_storage/records. If that is not the case, follow the instructions in scripts/meta-dataset/README.md first.
Modify the paths in the all files under configs/ to match your $HOME directory and any other path you want to customize.

The directory structure

We assume the next directory structure:

${HOME}
└── workspace
    ├── git_storage
    │   ├── openai-baselines
    │   ├── meta-dataset
    │   └── nas-dmrl
    ├── logs
    ├── metadataset_storage
    │   └── records
    └── results

Run experiments

Experiment 1

This first experiment requires to run an experiment per environment. Follow the next code snippets sequentially.

Deep meta-RL

# Training on the omniglot environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-training/run_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-omniglot.ini \
  -r > omniglot.log

# The log contains the directory where all the OpenAI logs are stored, there the policy is stored. 
# You can find it with the next command:

cat omniglot.log | grep "Saving trained model"
# Expected path is something like:
# /home/jgomes/workspace/results/experiment-20190816164503/openai-20190816164503/models/meta_a2c_final.model

# Training on the vgg_flower environment

# Assuming that we rename the policy from omniglot as 
# ${HOME}/workspace/results/policy-omniglot.model
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-training/run_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-vgg_flower.ini \
  -m ${HOME}/workspace/results/policy-omniglot.model \
  -r > vgg_flower.log

# Training on the dtd environment

# Assuming that we rename the policy from vgg_flower as 
# ${HOME}/workspace/results/policy-vgg_flower.model
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-training/run_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-dtd.ini \
  -m ${HOME}/workspace/results/policy-vgg_flower.model \
  -r > dtd.log

DeepQN

# Training on the omniglot environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_dqn.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/dqn/config-omniglot.ini \
  -r > omniglot-dqn.log

# Training on the vgg_flower environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_dqn.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/dqn/config-vgg_flower.ini \
  -r > vgg_flower-dqn.log

# Training on the dtd environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_dqn.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/dqn/config-dtd.ini \
  -r > dtd-dqn.log

Random search

# Running on the omniglot environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_random.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/random-search/config-omniglot.ini \
  -r > omniglot-rs.log

# Running on the vgg_flower environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_random.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/random-search/config-vgg_flower.ini \
  -r > vgg_flower-rs.log

# Running on the dtd environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_random.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/random-search/config-dtd.ini \
  -r > dtd-rs.log

Experiment 2

Evaluation of the policy

# Evaluating on aircraft

# Assuming the final policy from the Experiment 1 is renamed as 
# ${HOME}/workspace/results/policy-dtd.model
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-evaluation/evaluate_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-aircraft.ini \
  -m ${HOME}/workspace/results/policy-dtd.model \
  -r > aircraft.log

# Evaluating on cu_birds

# Assuming the final policy from the Experiment 1 is renamed as 
# ${HOME}/workspace/results/policy-dtd.model
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-evaluation/evaluate_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-cu_birds.ini \
  -m ${HOME}/workspace/results/policy-dtd.model \
  -r > cu_birds.log

Random search on evaluation datasets

# Running on the aircraft environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_random.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/random-search/config-aircraft.ini \
  -r > aircraft-rs.log

# Running on the cu_birds environment

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_nas_random.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/random-search/config-cu_birds.ini \
  -r > cu_birds-rs.log

Evaluating the best 2 architectures per dataset

To obtain the best two architectures per dataset, you can query the /home/jgomes/workspace/logs/dmrl/db_experiments.csv file, which is set in ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-cu_birds.ini and ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-aircraft.ini. For more information, check the nasgym documentation.

Once that the best architectures have been identified, go to the ${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-evaluation/network_evaluation.py and manually enter the architecture as a list of NSCs. Now, you can run the next command per network:

# Change the dataset's config file accordingly.
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-evaluation/run_network_evaluation.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-aircraft.ini \
  -r > aircraft-network.log

Train the shortened version of the VGG19

# Change the dataset's config file accordingly.
${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-benchmarks/run_network_benchmarking.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-aircraft.ini \
  -r > aircraft-network.log

Experiment 3

The third experiment is similar to experiment 1, but with only one environment and in a multi-branch setting. Run the next commands to obtain the results:

# Training on the omniglot environment with sigma=0.0

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-training/run_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-omniglot-mb-00.ini \
  -r > omniglot-mb-00.log

# Training on the omniglot environment with sigma=0.1

${WORKSPACE}/git_storage/nas-dmrl/scripts/experiments-training/run_nas_dmrl.sh \
  -c ${WORKSPACE}/git_storage/nas-dmrl/configs/meta-rl/config-omniglot-mb-10.ini \
  -r > omniglot-mb-10.log

How to visualize the results

To visualize the plots summarizing the results, use the notbookes in notebooks/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to reinforcement learn for Neural Architecture Search

Points of contact

Overview of the research project

Results

Experiment 1: training the meta-RL agent to design chain-structured networks

Experiment 2: evaluating the policy on previously unseen environments

a) Evaluatiing the policy

b) Evaluating the designed networks

Experiment 3: training the meta-RL agent to design multi-branch structures

How to run

Setup

The directory structure

Run experiments

Experiment 1

Deep meta-RL

DeepQN

Random search

Experiment 2

Evaluation of the policy

Random search on evaluation datasets

Evaluating the best 2 architectures per dataset

Train the shortened version of the VGG19

Experiment 3

How to visualize the results

About

Releases

Packages

Contributors 2

Languages

License

gomerudo/nas-dmrl

Folders and files

Latest commit

History

Repository files navigation

Learning to reinforcement learn for Neural Architecture Search

Points of contact

Overview of the research project

Results

Experiment 1: training the meta-RL agent to design chain-structured networks

Experiment 2: evaluating the policy on previously unseen environments

a) Evaluatiing the policy

b) Evaluating the designed networks

Experiment 3: training the meta-RL agent to design multi-branch structures

How to run

Setup

The directory structure

Run experiments

Experiment 1

Deep meta-RL

DeepQN

Random search

Experiment 2

Evaluation of the policy

Random search on evaluation datasets

Evaluating the best 2 architectures per dataset

Train the shortened version of the VGG19

Experiment 3

How to visualize the results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages