Add custom evaluators tutorial and extend evaluation docs by nadheesh · Pull Request #584 · wso2/agent-manager

nadheesh · 2026-03-18T12:41:40Z

Closes #583

Summary

Add a new Custom Evaluators tutorial with step-by-step walkthrough for creating code and LLM-judge evaluators in the AMP Console
Extend Evaluation concepts page with a Custom Evaluators section, tabbed evaluator type/level/built-in evaluator reference, and expanded Viewing Results (monitor dashboard + trace view)
Extend Evaluation Monitors tutorial with score breakdown tables (by agent, by model) and trace view score visibility sections
Add new screenshots for custom evaluator UI and evaluation trace view
Update sidebar to include the new tutorial

Summary by CodeRabbit

Documentation
- Added new tutorial for creating and managing custom evaluators with code and LLM-based evaluation options.
- Reorganized evaluation concepts documentation with tabbed interfaces for improved readability.
- Enhanced evaluation monitors guide with detailed instructions for viewing and interpreting scores across trace, agent, and LLM levels.
- Updated documentation version references to v0.9.x.

coderabbitai · 2026-03-18T12:42:05Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive documentation for custom evaluators in AMP Console, extends evaluation concepts with tabbed interfaces and reorganized evaluator categorization, enhances the evaluation monitors tutorial with score breakdown visibility, and updates sidebar navigation and version constants to reflect the v0.9.x release.

Changes

Cohort / File(s)	Summary
Custom Evaluators Tutorial `website/docs/tutorials/custom-evaluators.mdx`, `website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx`	New comprehensive tutorial documenting creation and usage of Code and LLM-Judge custom evaluators in AMP Console, including step-by-step UI workflows, configuration parameters, and editing/deletion constraints.
Evaluation Concepts Documentation `website/docs/concepts/evaluation.mdx`, `website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx`	Enhanced documentation with Tabs/TabItem UI blocks separating Rule-Based and LLM-as-Judge evaluators by type and evaluation level (Trace, Agent, LLM). Added new Custom Evaluators section with creation guidance, data models, and configuration parameters. Reorganized built-in evaluators table with updated naming and categorizations.
Evaluation Monitors Tutorial `website/docs/tutorials/evaluation-monitors.mdx`, `website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx`	Extended with new Score Breakdowns section (by agent and model), View Scores in Trace View section detailing score column rendering and span details display, and updated Evaluation Summary with per-level statistics and skip rates.
Navigation and Configuration `website/sidebars.ts`, `website/versioned_sidebars/version-v0.9.x-sidebars.json`, `website/versioned_docs/version-v0.9.x/_constants.md`	Updated sidebar entries to include new `tutorials/custom-evaluators` path. Updated version constants from v0.8.x/v0.8.0 to v0.9.x/v0.9.0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With custom eyes, we now can see,
Evaluators branching wild and free—
Code and prompts in harmony dance,
Documentation gives each change a chance!
✨ Traces, agents, LLMs too,
Breaking scores in every view! 📊

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately and concisely summarizes the main changes: adding a custom evaluators tutorial and extending evaluation documentation.
Description check	✅ Passed	The PR description clearly outlines all major changes aligned with issue `#583`, covering custom evaluators tutorial, concepts documentation extension, monitor tutorial updates, and sidebar changes.
Linked Issues check	✅ Passed	All code changes comprehensively address issue `#583` objectives: new custom evaluators tutorial added, evaluation concepts extended with tabbed sections, monitor tutorial enhanced with score breakdowns, and sidebar updated.
Out of Scope Changes check	✅ Passed	All changes are documentation-only and directly related to issue `#583` requirements. The constant version updates align with the v0.9.x documentation versioning and are not out of scope.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx (1)

140-148: Consider varying sentence structure for readability.

Three consecutive bullet points begin with "Was" (lines 142-144). While the parallel structure is intentional for a list, varying the phrasing slightly could improve flow.

📝 Suggested rewording

-- *Was this LLM response safe and free of harmful content?*
-- *Was the tone appropriate for the context?*
-- *Was the response coherent and well-structured?*
+- *Is this LLM response safe and free of harmful content?*
+- *Does the tone fit the context?*
+- *Is the response coherent and well-structured?*

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx` around lines
140 - 148, The three consecutive bullets beginning "Was this LLM response...",
"Was the tone...", and "Was the response..." (in the "Evaluates **each
individual LLM call** within the trace." section) should vary phrasing to
improve readability; edit the three bullet lines to maintain the same evaluation
meaning but change sentence starts (for example: "Is the LLM response safe and
free of harmful content?", "Does the tone fit the context?", "Is the response
coherent and well-structured?") while preserving parallelism and the final
bullet about cost efficiency.

website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx (1)

23-27: Tighten repeated imperative phrasing in the navigation steps.

On Line 24–Line 26, three consecutive steps start with “Click,” which reads a bit repetitive. Consider varying one or two verbs (e.g., “Open”, “Select”) for smoother flow.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx` around
lines 23 - 27, Change the repetitive "Click" verbs in the navigation steps to
improve flow: replace "Click the **Evaluation** tab" with something like "Open
the **Evaluation** tab", change "Click the **Evaluators** sub-tab" to "Select
the **Evaluators** sub-tab", and keep or rephrase "Click **Create Evaluator**"
to "Create **Evaluator**" or "Click **Create Evaluator**" as preferred so the
three consecutive steps no longer all start with "Click"; update the three lines
containing those exact phrases in version-v0.9.x/tutorials/custom-evaluators.mdx
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx`:
- Around line 140-148: The three consecutive bullets beginning "Was this LLM
response...", "Was the tone...", and "Was the response..." (in the "Evaluates
**each individual LLM call** within the trace." section) should vary phrasing to
improve readability; edit the three bullet lines to maintain the same evaluation
meaning but change sentence starts (for example: "Is the LLM response safe and
free of harmful content?", "Does the tone fit the context?", "Is the response
coherent and well-structured?") while preserving parallelism and the final
bullet about cost efficiency.

In `@website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx`:
- Around line 23-27: Change the repetitive "Click" verbs in the navigation steps
to improve flow: replace "Click the **Evaluation** tab" with something like
"Open the **Evaluation** tab", change "Click the **Evaluators** sub-tab" to
"Select the **Evaluators** sub-tab", and keep or rephrase "Click **Create
Evaluator**" to "Create **Evaluator**" or "Click **Create Evaluator**" as
preferred so the three consecutive steps no longer all start with "Click";
update the three lines containing those exact phrases in
version-v0.9.x/tutorials/custom-evaluators.mdx accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1e93f3b-fa93-4487-a0d4-885fa7b93ba1

📥 Commits

Reviewing files that changed from the base of the PR and between 333aab2 and b44f9e5.

⛔ Files ignored due to path filters (9)

website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-basic-details.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-code-details.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-code-editor.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-list.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-llm-judge-editor.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/monitor-dashboard.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/run-logs.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/span-scores-tab.png is excluded by !**/*.png
website/versioned_docs/version-v0.9.x/img/evaluation/traces-table-scores.png is excluded by !**/*.png

📒 Files selected for processing (5)

website/versioned_docs/version-v0.9.x/_constants.md
website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx
website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx
website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx
website/versioned_sidebars/version-v0.9.x-sidebars.json

✅ Files skipped from review due to trivial changes (3)

website/versioned_sidebars/version-v0.9.x-sidebars.json
website/versioned_docs/version-v0.9.x/_constants.md
website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx

Add custom evaluators tutorial and extend evaluation docs

61addce

nadheesh added 2 commits March 18, 2026 18:12

Merge branch 'wso2:main' into main

08862b2

Merge branch 'wso2:main' into main

4fbdbd0

nadheesh force-pushed the main branch 2 times, most recently from 333aab2 to 4fbdbd0 Compare March 19, 2026 07:19

nadheesh added 2 commits March 19, 2026 13:05

Merge branch 'wso2:main' into main

1bb29f1

Add missed custom eval docs to 0.9 released

b44f9e5

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

menakaj approved these changes Mar 19, 2026

View reviewed changes

menakaj merged commit 2b52eaf into wso2:main Mar 19, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom evaluators tutorial and extend evaluation docs#584

Add custom evaluators tutorial and extend evaluation docs#584
menakaj merged 5 commits intowso2:mainfrom
nadheesh:main

nadheesh commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nadheesh commented Mar 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nadheesh commented Mar 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 18, 2026 •

edited

Loading