add 2023-10-27

logikon-ai · Nov 24, 2023 · 16105b5 · 16105b5
1 parent 2a9b683
commit 16105b5
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/README.md b/README.md
@@ -112,6 +112,7 @@ _Methods for analysing LLM deliberation and assessing reasoning quality._
 
 _Things that don't work, or are poorly understood._
 
+- 🎓 LLMs may produce "encoded reasoning" that's unintelligable to humans, which may nullify any XAI gains from deliberative prompting. "Preventing Language Models From Hiding Their Reasoning." 2023-10-27. [[>paper](https://arxiv.org/abs/2310.18512)]
 - 🎓 LLMs judge and decide in function of available arguments (reason-responsiveness), but are more strongly influenced by fallacious and deceptive reasons as compared to sound ones. "How susceptible are LLMs to Logical Fallacies?" 2023-08-18. [[>paper](https://arxiv.org/abs/2308.09853)]
 - 🎓 Incorrect reasoning improves answer accuracy (nearly) as much as correct one. "Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting." 2023-07-20. [[>paper](https://arxiv.org/abs/2307.10573)]
 - 🎓 Zeroshot CoT reasoning in sensitive domains increases a LLM’s likelihood to produce harmful or undesirable output. "On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning." 2023-06-23. [[>paper](https://arxiv.org/abs/2212.08061)]