From d86526d83351c84f1b33e145ac233af9ea1276d2 Mon Sep 17 00:00:00 2001
From: NanoCode012 <nano@axolotl.ai>
Date: Thu, 13 Mar 2025 19:58:24 +0700
Subject: [PATCH] chore(docs): add cookbook/blog link to docs

---
 docs/lora_optims.qmd      | 4 ++++
 docs/reward_modelling.qmd | 4 ++++
 docs/rlhf.qmd             | 4 ++++
 3 files changed, 12 insertions(+)
diff --git a/docs/lora_optims.qmd b/docs/lora_optims.qmd
index 8bee20402e..a7555a0a33 100644
--- a/docs/lora_optims.qmd
+++ b/docs/lora_optims.qmd
@@ -66,6 +66,10 @@ logic to be compatible with more of them.
 
 </details>
 
+::: {.callout-tip}
+Check out our [LoRA optimizations blog](https://axolotlai.substack.com/p/accelerating-lora-fine-tuning-with).
+:::
+
 ## Usage
 
 These optimizations can be enabled in your Axolotl config YAML file. The
diff --git a/docs/reward_modelling.qmd b/docs/reward_modelling.qmd
index c9ac5f8019..386dc1f572 100644
--- a/docs/reward_modelling.qmd
+++ b/docs/reward_modelling.qmd
@@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f
 
 ### Process Reward Models (PRM)
 
+::: {.callout-tip}
+Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
+:::
+
 Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
 ```yaml
 base_model: Qwen/Qwen2.5-3B
diff --git a/docs/rlhf.qmd b/docs/rlhf.qmd
index 773b159e84..ac1cf03938 100644
--- a/docs/rlhf.qmd
+++ b/docs/rlhf.qmd
@@ -497,6 +497,10 @@ The input format is a simple JSON input with customizable fields based on the ab
 
 ### GRPO
 
+::: {.callout-tip}
+Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/axolotl-cookbook/tree/main/grpo#training-an-r1-style-large-language-model-using-grpo).
+:::
+
 GRPO uses custom reward functions and transformations. Please have them ready locally.
 
 For ex, to load OpenAI's GSM8K and use a random reward for completions: