update AI safety

metauni · Feb 17, 2024 · 0db8a2d · 0db8a2d
1 parent fb2d36e
commit 0db8a2d
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 31 deletions.
diff --git a/ai-safety/index.md b/ai-safety/index.md
@@ -68,33 +68,10 @@ Completing weekly readings is recommended. We sometimes briefly summarise the
 paper. Usually we dive in to discussing particular credits, concerns, or
 confusions.
 
-Upcoming readings and discussions:
-
-* **2024.02.01:** break (ICML deadline)
-
-* **2024.02.08:**
-  John Wentworth, 2021, three posts on selection theorems:
-
-  * "Selection theorems: A program for understanding agents",
-    [LessWrong](https://www.lesswrong.com/posts/G2Lne2Fi7Qra5Lbuf).
-  * "Some existing selection theorems",
-    [LessWrong](https://www.lesswrong.com/posts/N2NebPD78ioyWHhNm).
-  * "What selection theorems do we expect/want",
-    [LessWrong](https://www.lesswrong.com/posts/RuDD3aQWLDSb4eTXP).
-
-* **2024.02.15:**
-  Evan Hubinger *et al.,*
-  2024,
-  "Sleeper agents: Training deceptive LLMs that persist through safety
-  training".
-  [arXiv](https://arxiv.org/abs/2401.05566).
-
 <!--
-If you already read it, look at wider commentary from, e.g.,
-  [Scott Alexander](https://www.astralcodexten.com/p/ai-sleeper-agents),
-  [Zvi Mowshowitz](https://www.lesswrong.com/posts/Sf5CBSo44kmgFdyGM/).
+Upcoming readings and discussions:
 -->
-
+  
 
 <!--
 Cut:
@@ -128,6 +105,35 @@ Cut:
 
 Past readings and discussions (most recent first):
 
+* **2024.02.15:**
+  Evan Hubinger *et al.,*
+  2024,
+  "Sleeper agents: Training deceptive LLMs that persist through safety
+  training".
+  [arXiv](https://arxiv.org/abs/2401.05566).
+<!--
+If you already read it, look at wider commentary from, e.g.,
+  [Scott Alexander](https://www.astralcodexten.com/p/ai-sleeper-agents),
+  [Zvi Mowshowitz](https://www.lesswrong.com/posts/Sf5CBSo44kmgFdyGM/).
+-->
+
+* **2024.02.08:**
+  John Wentworth, 2021, three posts on selection theorems:
+
+  * "Selection theorems: A program for understanding agents",
+    [LessWrong](https://www.lesswrong.com/posts/G2Lne2Fi7Qra5Lbuf).
+
+  * "Some existing selection theorems",
+    [LessWrong](https://www.lesswrong.com/posts/N2NebPD78ioyWHhNm).
+
+  * "What selection theorems do we expect/want",
+    [LessWrong](https://www.lesswrong.com/posts/RuDD3aQWLDSb4eTXP).
+
+
+<!--
+* **2024.02.01:** break (ICML deadline)
+-->
+
 * **2024.01.25:**
   Peter Vamplew, Richard Dazeley, *et al.*,
   2018,

diff --git a/schedule/schedule.yml b/schedule/schedule.yml
@@ -14,12 +14,6 @@ whats on:
      desc: "Towards a science of technological disruption. An open discussion of technological disruption, how it works, what it means and how to play a positive part in it."
      website: https://metauni.org/disruption
      location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
-- AI Safety:
-    time: 15:00-16:00
-    organizer: Matthew Farrugia-Roberts
-    desc: "reading group on technical and philosophical topics in AI safety."
-    website: https://metauni.org/ai-safety
-    location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
 - Singular Learning Theory:
     time: 16:00-17:30
     organizer: Dan Murfet, Edmund Lau
@@ -39,6 +33,12 @@ whats off:
     desc: "Neuroscience for AI interpretability."
     website: https://metauni.org/neuro
     location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
+- AI Safety:
+    time: 15:00-16:00
+    organizer: Matthew Farrugia-Roberts
+    desc: "reading group on technical and philosophical topics in AI safety."
+    website: https://metauni.org/ai-safety
+    location: https://www.roblox.com/games/start?placeId=8165217582&launchData=/
 - Code:
     time: 16:00-17:30
     organizer: Ethan Curtiss, Billy Price