Skip to content

Commit

Permalink
update AI safety
Browse files Browse the repository at this point in the history
  • Loading branch information
matomatical committed Feb 17, 2024
1 parent fb2d36e commit 0db8a2d
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 31 deletions.
56 changes: 31 additions & 25 deletions ai-safety/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,33 +68,10 @@ Completing weekly readings is recommended. We sometimes briefly summarise the
paper. Usually we dive in to discussing particular credits, concerns, or
confusions.

Upcoming readings and discussions:

* **2024.02.01:** break (ICML deadline)

* **2024.02.08:**
John Wentworth, 2021, three posts on selection theorems:

* "Selection theorems: A program for understanding agents",
[LessWrong](https://www.lesswrong.com/posts/G2Lne2Fi7Qra5Lbuf).
* "Some existing selection theorems",
[LessWrong](https://www.lesswrong.com/posts/N2NebPD78ioyWHhNm).
* "What selection theorems do we expect/want",
[LessWrong](https://www.lesswrong.com/posts/RuDD3aQWLDSb4eTXP).

* **2024.02.15:**
Evan Hubinger *et al.,*
2024,
"Sleeper agents: Training deceptive LLMs that persist through safety
training".
[arXiv](https://arxiv.org/abs/2401.05566).

<!--
If you already read it, look at wider commentary from, e.g.,
[Scott Alexander](https://www.astralcodexten.com/p/ai-sleeper-agents),
[Zvi Mowshowitz](https://www.lesswrong.com/posts/Sf5CBSo44kmgFdyGM/).
Upcoming readings and discussions:
-->


<!--
Cut:
Expand Down Expand Up @@ -128,6 +105,35 @@ Cut:

Past readings and discussions (most recent first):

* **2024.02.15:**
Evan Hubinger *et al.,*
2024,
"Sleeper agents: Training deceptive LLMs that persist through safety
training".
[arXiv](https://arxiv.org/abs/2401.05566).
<!--
If you already read it, look at wider commentary from, e.g.,
[Scott Alexander](https://www.astralcodexten.com/p/ai-sleeper-agents),
[Zvi Mowshowitz](https://www.lesswrong.com/posts/Sf5CBSo44kmgFdyGM/).
-->

* **2024.02.08:**
John Wentworth, 2021, three posts on selection theorems:

* "Selection theorems: A program for understanding agents",
[LessWrong](https://www.lesswrong.com/posts/G2Lne2Fi7Qra5Lbuf).

* "Some existing selection theorems",
[LessWrong](https://www.lesswrong.com/posts/N2NebPD78ioyWHhNm).

* "What selection theorems do we expect/want",
[LessWrong](https://www.lesswrong.com/posts/RuDD3aQWLDSb4eTXP).


<!--
* **2024.02.01:** break (ICML deadline)
-->

* **2024.01.25:**
Peter Vamplew, Richard Dazeley, *et al.*,
2018,
Expand Down
12 changes: 6 additions & 6 deletions schedule/schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@ whats on:
desc: "Towards a science of technological disruption. An open discussion of technological disruption, how it works, what it means and how to play a positive part in it."
website: https://metauni.org/disruption
location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
- AI Safety:
time: 15:00-16:00
organizer: Matthew Farrugia-Roberts
desc: "reading group on technical and philosophical topics in AI safety."
website: https://metauni.org/ai-safety
location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
- Singular Learning Theory:
time: 16:00-17:30
organizer: Dan Murfet, Edmund Lau
Expand All @@ -39,6 +33,12 @@ whats off:
desc: "Neuroscience for AI interpretability."
website: https://metauni.org/neuro
location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
- AI Safety:
time: 15:00-16:00
organizer: Matthew Farrugia-Roberts
desc: "reading group on technical and philosophical topics in AI safety."
website: https://metauni.org/ai-safety
location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
- Code:
time: 16:00-17:30
organizer: Ethan Curtiss, Billy Price
Expand Down

0 comments on commit 0db8a2d

Please sign in to comment.