Skip to content

chore(docs): add cookbook/blog link to docs #2410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/lora_optims.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ logic to be compatible with more of them.

</details>

::: {.callout-tip}
Check out our [LoRA optimizations blog](https://axolotlai.substack.com/p/accelerating-lora-fine-tuning-with).
:::

## Usage

These optimizations can be enabled in your Axolotl config YAML file. The
Expand Down
4 changes: 4 additions & 0 deletions docs/reward_modelling.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Bradley-Terry chat templates expect single-turn conversations in the following f

### Process Reward Models (PRM)

::: {.callout-tip}
Check out our [PRM blog](https://axolotlai.substack.com/p/process-reward-models).
:::

Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
```yaml
base_model: Qwen/Qwen2.5-3B
Expand Down
4 changes: 4 additions & 0 deletions docs/rlhf.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,10 @@ The input format is a simple JSON input with customizable fields based on the ab

### GRPO

::: {.callout-tip}
Check out our [GRPO cookbook](https://github.com/axolotl-ai-cloud/axolotl-cookbook/tree/main/grpo#training-an-r1-style-large-language-model-using-grpo).
:::

GRPO uses custom reward functions and transformations. Please have them ready locally.

For ex, to load OpenAI's GSM8K and use a random reward for completions:
Expand Down