@@ -74,50 +74,19 @@ For example (DDPG):
74
74
75
75
## Algorithms
76
76
77
- - [x] [ DQN] ( https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf )
78
-
79
- ![ DQN] ( DQN/DQNAgent_200.gif )
80
-
81
- - [x] [ DDQN] ( https://arxiv.org/pdf/1509.06461.pdf )
82
-
83
- ![ DDQN] ( DDQN/DDQNAgent_100.gif )
84
-
85
- - [x] [ DDPG] ( https://arxiv.org/pdf/1509.02971.pdf )
86
-
87
- ![ DDPG] ( DDPG/DDPGAgent_200.gif )
88
-
89
- - [x] [ PPO] ( https://arxiv.org/pdf/1707.06347.pdf )
90
-
91
- ![ PPO] ( PPO/PPOAgent_200.gif )
92
-
93
- - [x] [ Distributed Q learning (C51)] ( https://arxiv.org/pdf/1707.06887.pdf )
94
-
95
- ![ C51] ( C51/C51Agent_100.gif )
96
-
97
- - [x] [ AWR] ( https://openreview.net/attachment?id=H1gdF34FvS&name=original_pdf )
98
-
99
- ![ AWR] ( AWR/AWRAgent_200.gif )
100
-
101
- - [x] [ AC] ( https://proceedings.neurips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf )
102
-
103
- ![ AC] ( AC/A2CAgent_600.gif )
104
-
105
- - [x] [ TD3] ( https://arxiv.org/pdf/1802.09477.pdf )
106
-
107
- ![ TD3] ( TD3/TD3Agent_100.gif )
108
-
109
- - improve ` AWR ` , ` DDPG ` ` TD3 ` with Gumbel Distribution Regression from [ ` XQL ` ] ( https://div99.github.io/XQL ) :
110
- - XAWR
111
-
112
- ![ XAWR] ( XAWR/XAWRAgent_100.gif )
113
-
114
- - XDDPG
115
-
116
- ![ XDDPG] ( XDDPG/XDDPGAgent_200.gif )
117
-
118
- - XTD3
119
-
120
- ![ XTD3] ( XTD3/XTD3Agent_100.gif )
77
+ | model | paper link | After Training |
78
+ | :---: | :----------------------------------------------------------------------------------: | :--------------------------------: |
79
+ | DQN | https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf | ![ DQN] ( DQN/DQNAgent_200.gif ) |
80
+ | DDQN | https://arxiv.org/pdf/1509.06461.pdf | ![ DDQN] ( DDQN/DDQNAgent_100.gif ) |
81
+ | DDPG | https://arxiv.org/pdf/1509.02971.pdf | ![ DDPG] ( DDPG/DDPGAgent_200.gif ) |
82
+ | PPO | https://arxiv.org/pdf/1707.06347.pdf | ![ PPO] ( PPO/PPOAgent_200.gif ) |
83
+ | C51 | https://arxiv.org/pdf/1707.06887.pdf | ![ C51] ( C51/C51Agent_100.gif ) |
84
+ | AWR | https://openreview.net/attachment?id=H1gdF34FvS | ![ AWR] ( AWR/AWRAgent_200.gif ) |
85
+ | AC | https://proceedings.neurips.cc/paper/1999/file | ![ AC] ( AC/A2CAgent_600.gif ) |
86
+ | TD3 | https://arxiv.org/pdf/1802.09477.pdf | ![ TD3] ( TD3/TD3Agent_100.gif ) |
87
+ | XAWR | Improved with Gumbel Distribution Regression from [ XQL] ( https://div99.github.io/XQL ) | ![ XAWR] ( XAWR/XAWRAgent_100.gif ) |
88
+ | XDDPG | Improved with Gumbel Distribution Regression from [ XQL] ( https://div99.github.io/XQL ) | ![ XDDPG] ( XDDPG/XDDPGAgent_200.gif ) |
89
+ | XTD3 | Improved with Gumbel Distribution Regression from [ XQL] ( https://div99.github.io/XQL ) | ![ XTD3] ( XTD3/XTD3Agent_100.gif ) |
121
90
122
91
## Reference
123
92
0 commit comments