Skip to content

Commit 3420a68

Browse files
authored
Update README.md
1 parent 83a0522 commit 3420a68

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,11 @@ Safe exploration of reinforcement learning (RL) agents during training is a crit
1010
<p align="center">Fig 1. A high-level overview of ADVICE including training, inference, and the adaptive extension.</p>
1111

1212
### ADVICE
13-
Add short methodology here...
13+
ADVICE starts with collecting a dataset of state-action pairs, classified as either safe or unsafe based on the outcomes they lead to within the training environment. This dataset is then used to train the contrastive autoencoder. The training process leverages a unique loss function that helps the model learn by comparing similar (safe) and dissimilar (unsafe) pairs, enhancing its ability to identify and categorize new observations quickly. To classify unseen data, a nearest neighbours model is fit on the final embedding space.
14+
15+
Once trained, ADVICE operates by passing the current state and the desired action through the encoder and then classifying the unseen embeddings using the nearest neighbours model and a safety threshold k. If deemed safe, the RL agent can continue. If deemed unsafe, ADVICE will select the next best safe action. The paramter k denotes the conservativeness of the shield.
16+
17+
The adaptive nearest neighbours module automatically adjusts the conservativeness of the safety threshold dynamically, based on the agent's recent performance. If the agent has been performing safely, the system can allow more exploratory actions; conversely, if safety violations increase, the system becomes more conservative.
1418

1519
### Getting Started
1620
To get started you can run the `main.py` file from our source code. To run ADVICE in the [safety gymnasium](https://github.com/PKU-Alignment/safety-gymnasium) test suite, it is required you run the following pip command:

0 commit comments

Comments
 (0)