axolotl-ai-cloud · winglian · Mar 17, 2025 · Mar 13, 2025
diff --git a/docs/lora_optims.qmd b/docs/lora_optims.qmd
@@ -66,6 +66,10 @@ logic to be compatible with more of them.
 
 </details>
 
+::: {.callout-tip}
+Check out our [LoRA optimizations blog](https://axolotlai.substack.com/p/accelerating-lora-fine-tuning-with).
+:::
+
 ## Usage
 
 These optimizations can be enabled in your Axolotl config YAML file. The

diff --git a/docs/reward_modelling.qmd b/docs/reward_modelling.qmd
@@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f
 
 ### Process Reward Models (PRM)
 
+::: {.callout-tip}
+Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
+:::
+
 Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
 ```yaml
 base_model: Qwen/Qwen2.5-3B

diff --git a/docs/rlhf.qmd b/docs/rlhf.qmd
@@ -497,6 +497,10 @@ The input format is a simple JSON input with customizable fields based on the ab
 
 ### GRPO
 
+::: {.callout-tip}
+Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/axolotl-cookbook/tree/main/grpo#training-an-r1-style-large-language-model-using-grpo).
+:::
+
 GRPO uses custom reward functions and transformations. Please have them ready locally.
 
 For ex, to load OpenAI's GSM8K and use a random reward for completions: