Skip to content

Add custom evaluators tutorial and extend evaluation docs#584

Merged
menakaj merged 5 commits intowso2:mainfrom
nadheesh:main
Mar 19, 2026
Merged

Add custom evaluators tutorial and extend evaluation docs#584
menakaj merged 5 commits intowso2:mainfrom
nadheesh:main

Conversation

@nadheesh
Copy link
Contributor

@nadheesh nadheesh commented Mar 18, 2026

Closes #583

Summary

  • Add a new Custom Evaluators tutorial with step-by-step walkthrough for creating code and LLM-judge evaluators in the AMP Console
  • Extend Evaluation concepts page with a Custom Evaluators section, tabbed evaluator type/level/built-in evaluator reference, and expanded Viewing Results (monitor dashboard + trace view)
  • Extend Evaluation Monitors tutorial with score breakdown tables (by agent, by model) and trace view score visibility sections
  • Add new screenshots for custom evaluator UI and evaluation trace view
  • Update sidebar to include the new tutorial

Summary by CodeRabbit

  • Documentation
    • Added new tutorial for creating and managing custom evaluators with code and LLM-based evaluation options.
    • Reorganized evaluation concepts documentation with tabbed interfaces for improved readability.
    • Enhanced evaluation monitors guide with detailed instructions for viewing and interpreting scores across trace, agent, and LLM levels.
    • Updated documentation version references to v0.9.x.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This PR adds comprehensive documentation for custom evaluators in AMP Console, extends evaluation concepts with tabbed interfaces and reorganized evaluator categorization, enhances the evaluation monitors tutorial with score breakdown visibility, and updates sidebar navigation and version constants to reflect the v0.9.x release.

Changes

Cohort / File(s) Summary
Custom Evaluators Tutorial
website/docs/tutorials/custom-evaluators.mdx, website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx
New comprehensive tutorial documenting creation and usage of Code and LLM-Judge custom evaluators in AMP Console, including step-by-step UI workflows, configuration parameters, and editing/deletion constraints.
Evaluation Concepts Documentation
website/docs/concepts/evaluation.mdx, website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx
Enhanced documentation with Tabs/TabItem UI blocks separating Rule-Based and LLM-as-Judge evaluators by type and evaluation level (Trace, Agent, LLM). Added new Custom Evaluators section with creation guidance, data models, and configuration parameters. Reorganized built-in evaluators table with updated naming and categorizations.
Evaluation Monitors Tutorial
website/docs/tutorials/evaluation-monitors.mdx, website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx
Extended with new Score Breakdowns section (by agent and model), View Scores in Trace View section detailing score column rendering and span details display, and updated Evaluation Summary with per-level statistics and skip rates.
Navigation and Configuration
website/sidebars.ts, website/versioned_sidebars/version-v0.9.x-sidebars.json, website/versioned_docs/version-v0.9.x/_constants.md
Updated sidebar entries to include new tutorials/custom-evaluators path. Updated version constants from v0.8.x/v0.8.0 to v0.9.x/v0.9.0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 With custom eyes, we now can see,
Evaluators branching wild and free—
Code and prompts in harmony dance,
Documentation gives each change a chance!
✨ Traces, agents, LLMs too,
Breaking scores in every view! 📊

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely summarizes the main changes: adding a custom evaluators tutorial and extending evaluation documentation.
Description check ✅ Passed The PR description clearly outlines all major changes aligned with issue #583, covering custom evaluators tutorial, concepts documentation extension, monitor tutorial updates, and sidebar changes.
Linked Issues check ✅ Passed All code changes comprehensively address issue #583 objectives: new custom evaluators tutorial added, evaluation concepts extended with tabbed sections, monitor tutorial enhanced with score breakdowns, and sidebar updated.
Out of Scope Changes check ✅ Passed All changes are documentation-only and directly related to issue #583 requirements. The constant version updates align with the v0.9.x documentation versioning and are not out of scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nadheesh nadheesh force-pushed the main branch 2 times, most recently from 333aab2 to 4fbdbd0 Compare March 19, 2026 07:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx (1)

140-148: Consider varying sentence structure for readability.

Three consecutive bullet points begin with "Was" (lines 142-144). While the parallel structure is intentional for a list, varying the phrasing slightly could improve flow.

📝 Suggested rewording
-- *Was this LLM response safe and free of harmful content?*
-- *Was the tone appropriate for the context?*
-- *Was the response coherent and well-structured?*
+- *Is this LLM response safe and free of harmful content?*
+- *Does the tone fit the context?*
+- *Is the response coherent and well-structured?*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx` around lines
140 - 148, The three consecutive bullets beginning "Was this LLM response...",
"Was the tone...", and "Was the response..." (in the "Evaluates **each
individual LLM call** within the trace." section) should vary phrasing to
improve readability; edit the three bullet lines to maintain the same evaluation
meaning but change sentence starts (for example: "Is the LLM response safe and
free of harmful content?", "Does the tone fit the context?", "Is the response
coherent and well-structured?") while preserving parallelism and the final
bullet about cost efficiency.
website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx (1)

23-27: Tighten repeated imperative phrasing in the navigation steps.

On Line 24–Line 26, three consecutive steps start with “Click,” which reads a bit repetitive. Consider varying one or two verbs (e.g., “Open”, “Select”) for smoother flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx` around
lines 23 - 27, Change the repetitive "Click" verbs in the navigation steps to
improve flow: replace "Click the **Evaluation** tab" with something like "Open
the **Evaluation** tab", change "Click the **Evaluators** sub-tab" to "Select
the **Evaluators** sub-tab", and keep or rephrase "Click **Create Evaluator**"
to "Create **Evaluator**" or "Click **Create Evaluator**" as preferred so the
three consecutive steps no longer all start with "Click"; update the three lines
containing those exact phrases in version-v0.9.x/tutorials/custom-evaluators.mdx
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx`:
- Around line 140-148: The three consecutive bullets beginning "Was this LLM
response...", "Was the tone...", and "Was the response..." (in the "Evaluates
**each individual LLM call** within the trace." section) should vary phrasing to
improve readability; edit the three bullet lines to maintain the same evaluation
meaning but change sentence starts (for example: "Is the LLM response safe and
free of harmful content?", "Does the tone fit the context?", "Is the response
coherent and well-structured?") while preserving parallelism and the final
bullet about cost efficiency.

In `@website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx`:
- Around line 23-27: Change the repetitive "Click" verbs in the navigation steps
to improve flow: replace "Click the **Evaluation** tab" with something like
"Open the **Evaluation** tab", change "Click the **Evaluators** sub-tab" to
"Select the **Evaluators** sub-tab", and keep or rephrase "Click **Create
Evaluator**" to "Create **Evaluator**" or "Click **Create Evaluator**" as
preferred so the three consecutive steps no longer all start with "Click";
update the three lines containing those exact phrases in
version-v0.9.x/tutorials/custom-evaluators.mdx accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1e93f3b-fa93-4487-a0d4-885fa7b93ba1

📥 Commits

Reviewing files that changed from the base of the PR and between 333aab2 and b44f9e5.

⛔ Files ignored due to path filters (9)
  • website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-basic-details.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-code-details.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-code-editor.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-list.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/custom-eval-llm-judge-editor.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/monitor-dashboard.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/run-logs.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/span-scores-tab.png is excluded by !**/*.png
  • website/versioned_docs/version-v0.9.x/img/evaluation/traces-table-scores.png is excluded by !**/*.png
📒 Files selected for processing (5)
  • website/versioned_docs/version-v0.9.x/_constants.md
  • website/versioned_docs/version-v0.9.x/concepts/evaluation.mdx
  • website/versioned_docs/version-v0.9.x/tutorials/custom-evaluators.mdx
  • website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx
  • website/versioned_sidebars/version-v0.9.x-sidebars.json
✅ Files skipped from review due to trivial changes (3)
  • website/versioned_sidebars/version-v0.9.x-sidebars.json
  • website/versioned_docs/version-v0.9.x/_constants.md
  • website/versioned_docs/version-v0.9.x/tutorials/evaluation-monitors.mdx

@menakaj menakaj merged commit 2b52eaf into wso2:main Mar 19, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add custom evaluators tutorial and extend evaluation docs

2 participants