Train your own ML-agents on limited resources, condemning them to be ✨ dummy ✨ (but very much loved), and compare them with the best agents in the field.
On the left the custom-trained agents, on the right the best agents.
Download Unity Hub and from there install
the editor 2021.3.11f1
(in official releases, the LTS
version).
Download also the most recent stable release of
ML-Agents.
Open Unity Hub and import the folder Project
from the previously downloaded
ML-Agents package.
Since we'll train our own agents, we also have to set up python properly.
Open the previously downloaded folder ML-Agents in the terminal, and execute:
pip install -e ml-agents
pip install -e ml-agents-envs
Note. To do it properly, it is suggested to create a virtual environment.
Move to the opened Unity project and, down-left, navigate to
Project > Assets > ML-Agents > Examples > 3DBall > Scenes
.
Double-click the folder Scenes
and then double-click the file 3DBall
.
About the 3D balance ball game:
The 3D Balance Ball environment contains a number of agent cubes and balls (which are all copies of each other).
Each agent cube tries to keep its ball from falling by rotating either horizontally or vertically.
In this environment, an agent cube is an Agent that receives a reward for every step that it balances the ball.
An agent is also penalized with a negative reward for dropping the ball.
The goal of the training process is to have the agents learn to balance the ball on their head.
By pressing the play button in the top center we can see how the pre-trained agents perform.
To train our own agents, we need to open the ML-Agents folder from the terminal and execute the following command:
mlagents-learn config/ppo/3DBall.yaml --run-id=first_run_3dball
where mlagents-learn
is the general command to interact with the ML-Agents
framework, the config/ppo/3DBall.yaml
specifies the settings of the training,
namely:
behaviors:
3DBall:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 500000
time_horizon: 1000
summary_freq: 12000
and --run-id=first_run_3dball
creates a new folder called first_run_3dball
with every information and output of this run.
At this point, the response from mlagents-learn
will be:
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
When this information appears, we need to move to Unity and press play.
Now the training has started. We can see how the agents train within Unity:
And we can also see some numbers in the terminal:
[INFO] 3DBall. Step: 12000. Time Elapsed: 27.143 s. Mean Reward: 1.136. Std of Reward: 0.710. Training
[INFO] 3DBall. Step: 24000. Time Elapsed: 44.679 s. Mean Reward: 1.424. Std of Reward: 0.889. Training
[INFO] 3DBall. Step: 36000. Time Elapsed: 60.184 s. Mean Reward: 2.095. Std of Reward: 1.211. Training
[INFO] 3DBall. Step: 48000. Time Elapsed: 74.971 s. Mean Reward: 3.351. Std of Reward: 2.606. Training
[INFO] 3DBall. Step: 60000. Time Elapsed: 90.024 s. Mean Reward: 8.022. Std of Reward: 6.551. Training
[INFO] 3DBall. Step: 72000. Time Elapsed: 104.211 s. Mean Reward: 20.102. Std of Reward: 23.587. Training
[INFO] 3DBall. Step: 84000. Time Elapsed: 119.592 s. Mean Reward: 53.427. Std of Reward: 37.335. Training
[INFO] 3DBall. Step: 96000. Time Elapsed: 133.754 s. Mean Reward: 80.107. Std of Reward: 29.116. Training
[INFO] 3DBall. Step: 108000. Time Elapsed: 147.332 s. Mean Reward: 95.433. Std of Reward: 13.151. Training
[INFO] 3DBall. Step: 120000. Time Elapsed: 161.466 s. Mean Reward: 90.929. Std of Reward: 23.583. Training
..
During training, we keep track of the mean reward of the agents and its standard deviation.
To interrupt the training press CTRL
+ C
.
[INFO] Learning was interrupted. Please wait while the graph is generated.
[INFO] Exported results\first_run_3dball\3DBall\3DBall-130071.onnx
[INFO] Copied results\first_run_3dball\3DBall\3DBall-130071.onnx to results\first_run_3dball\3DBall.onnx.
The current (last, before interruption) state is saved in the folder
results/first_run_3dball
.
It is not enough to have trained a behavioral model, we now need to tell the agents to use the new model instead of the standard one.
First of all, go to the folder results/first_run_3dball
and rename the file
3DBall.onnx
in 3DBall_beginner.onnx
(or whatever, but somehow different).
Then, navigate to Project > Assets > ML-Agents > Examples > 3DBall > TFModels
:
Drag the model 3DBall_beginner.onnx
and drop it here:
Now, navigate to Project > Assets > ML-Agents > Examples > 3DBall > Prefabs
and double-click the element 3DBall
.
Consider now the window on the up-left and double-click on Agent
.
What we need to modify is the behavioral model the agents use (window on the right):
In the window on the right, in the section model
, click the dot and select the
model we previously introduced:
Now by pressing play we can see the agents following the newest behavioral model: