Neural Combinatorial Optimization with RL

TensorFlow implementation and extension of Neural Combinatorial Optimization with Reinforcement Learning for the Traveling Salesman Problem (TSP) and the TSP with Time Windows (TSP-TW)

The Neural Network consists in a RNN or self attentive encoder-decoder with an attention module connecting the decoder to the encoder (via a "pointer"). The model is trained by Policy Gradient (Reinforce, 1992).

Requirements

Python 2.7 or 3.5
TensorFlow 1.0.1
tqdm
Google OR tools - optional reference solver (main.py, dataset.py)

Architecture

(under progress)

Usage

TSP

To train a (2D TSP20) model from scratch (data is generated on the fly):

> python main.py --max_length=20 --inference_mode=False --restore_model=False --save_to=20/model --log_dir=summary/20/repo

NB: Just make sure ./save/20/model exists (create folder otherwise)

To visualize training on tensorboard:

> tensorboard --logdir=summary/20/repo

To test a trained model:

> python main.py --max_length=20 --inference_mode=True --restore_model=True --restore_from=20/model

TSP-TW

To pretrain a (2D TSPTW20) model with infinite travel speed from scratch:

> python main.py --inference_mode=False --pretrain=True --restore_model=False --speed=1000. --beta=3  --save_to=speed1000/n20w100 --log_dir=summary/speed1000/n20w100

To fine tune a (2D TSPTW20) model with finite travel speed:

> python main.py --inference_mode=False --pretrain=False --kNN=5 --restore_model=True --restore_from=speed1000/n20w100 --speed=10.0 --beta=3 --save_to=speed10/s10_k5_n20w100 --log_dir=summary/speed10/s10_k5_n20w100

NB: Just make sure save_to folders exist

To visualize training on tensorboard:

> tensorboard --logdir=summary/speed1000/n20w100

> tensorboard --logdir=summary/speed10/s10_k5_n20w100

To test a trained model with finite travel speed on Dumas instances (in the benchmark folder):

> python main.py --inference_mode=True --restore_model=True --restore_from=speed10/s10_k5_n20w100 --speed=10.0

Results

TSP

Sampling 128 permutations with the Self-Attentive Encoder + Pointer Decoder:

Comparison to Google OR tools on 1000 TSP20 instances: (predicted tour length) = 0.9983 * (target tour length)

TSP-TW

Sampling 256 permutations with the RNN Encoder + Pointer Decoder, followed by a 2-opt post processing on best tour:

Dumas instance n20w100.001
Dumas instance n20w100.003

Authors

Michel Deudon / @mdeudon

Pierre Cournut / @pcournut

References

Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Img		Img
Ptr_Net_TSPTW		Ptr_Net_TSPTW
Self_Net_TSP		Self_Net_TSP
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Combinatorial Optimization with RL

Requirements

Architecture

Usage

TSP

TSP-TW

Results

TSP

TSP-TW

Authors

References

About

Releases

Packages

Languages

License

ThomasBrady/neural-combinatorial-optimization-rl-tensorflow

Folders and files

Latest commit

History

Repository files navigation

Neural Combinatorial Optimization with RL

Requirements

Architecture

Usage

TSP

TSP-TW

Results

TSP

TSP-TW

Authors

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages