rhesis-ai · cursor · Mar 26, 2026
diff --git a/docs/content/changelog.mdx b/docs/content/changelog.mdx
@@ -8,6 +8,74 @@ This is the aggregated changelog for the entire Rhesis repository. For detailed
 - [Frontend Changelog](https://github.com/rhesis-ai/rhesis/blob/main/apps/frontend/CHANGELOG.md)
 - [Polyphemus Changelog](https://github.com/rhesis-ai/rhesis/blob/main/apps/polyphemus/CHANGELOG.md)
 
+## [0.6.10] - 2026-03-23
+
+### Platform Release
+
+This release includes the following component versions:
+- **Backend 0.6.9**
+- **Frontend 0.6.10**
+- **SDK 0.6.10**
+- **Polyphemus 0.2.8**
+
+### Summary
+
+This release introduces **adaptive testing overwrite controls and suggestion/evaluation workflows**, **typed SDK
+statistics collection methods**, and **NIST-aligned password hardening** across authentication flows.
+
+### Featured Capabilities
+
+**Adaptive Testing Iteration Loop**
+
+Adaptive testing now supports a tighter edit-run-evaluate workflow:
+
+- **Delete Adaptive Test Sets**: Adaptive testing sets can now be deleted directly from the adaptive testing flow
+- **Overwrite Controls**: Output generation and evaluation support `overwrite` behavior so teams can explicitly
+  re-run existing tests instead of only processing missing outputs/results
+- **Suggestion Pipeline**: Added suggestion generation, suggestion output generation, and suggestion evaluation
+  endpoints to iterate on test quality before persisting new tests
+- **Bulk Test Deletion**: Adaptive testing UI supports bulk deletion for faster curation
+
+**SDK Statistics API Enhancements**
+
+Stats access in the SDK now has clearer, typed entry points:
+
+- **Collection Methods**: `TestRuns.stats()` and `TestResults.stats()` expose typed response models
+- **Per-Run Shortcut**: `TestRun.stats()` delegates to run-scoped stats for convenience
+- **DataFrame Conversion**: Stats responses support `to_dataframe()` for optional pandas workflows
+
+**Authentication Hardening**
+
+Password validation now aligns with NIST-oriented best practices:
+
+- **Minimum Length**: Default minimum password length is now 12 characters
+- **Strength Scoring**: zxcvbn-based strength checks are enforced with configurable minimum score
+- **Context Blocking**: Passwords are rejected if they contain user/service context words
+- **Breach Screening**: Optional HaveIBeenPwned k-anonymity checks reject known compromised passwords
+
+### Backend Highlights
+- Added adaptive testing evaluate and suggestion endpoints, including overwrite support
+- Added deletion endpoint for adaptive testing test sets
+- Exposed password policy settings through auth provider metadata
+- Hardened password validation with zxcvbn and breach checks
+- Continued metric evaluation refactor toward strategy-based execution
+
+### Frontend Highlights
+- Added adaptive testing delete test set action and bulk test deletion support
+- Added overwrite toggles for adaptive testing output generation and evaluation
+- Added suggestion generation/evaluation UI flow in adaptive testing detail
+- Updated auth screens to enforce the new password policy and improve rate-limit/registration errors
+- Added attachments count column in tests grid
+
+### SDK Highlights
+- Added typed stats collection methods for test runs and test results
+- Added stable transcript formatting support via `ConversationHistory.format_conversation()`
+- Kept async-first metric/model improvements and strategy-based evaluation integration
+
+### Polyphemus Highlights
+- No component version change in this platform release (`0.2.8`)
+- Existing `POST /generate_batch` capability remains available for batched generation workloads
+
 ## [0.6.9] - 2026-03-12
 
 ### Platform Release

diff --git a/docs/content/contribute/backend/authentication.mdx b/docs/content/contribute/backend/authentication.mdx
@@ -107,6 +107,42 @@ AUTH_REGISTRATION_ENABLED=true
 FRONTEND_URL=http://localhost:3000`}
 </CodeBlock>
 
+## Password Policy (NIST-aligned)
+
+Password validation is enforced server-side during:
+
+- `POST /auth/register`
+- `POST /auth/reset-password`
+
+Both flows call `validate_password(...)` with user context (email and name), so context words are checked in
+addition to length and strength.
+
+### Policy values and defaults
+
+| Environment variable | Default | Description |
+| --- | --- | --- |
+| `PASSWORD_MIN_LENGTH` | `12` | Minimum password length |
+| `PASSWORD_MAX_LENGTH` | `128` | Maximum password length |
+| `PASSWORD_MIN_STRENGTH_SCORE` | `2` | Minimum zxcvbn score (0-4) |
+| `PASSWORD_CHECK_BREACHED` | `true` | Enables HaveIBeenPwned k-anonymity breach check |
+
+### Frontend policy discovery
+
+The frontend reads password policy from:
+
+- `GET /auth/providers` -> `password_policy.min_length`
+- `GET /auth/providers` -> `password_policy.max_length`
+- `GET /auth/providers` -> `password_policy.min_strength_score`
+
+This lets clients validate early while preserving backend enforcement as the source of truth.
+
+### Security behavior
+
+- Passwords must not be whitespace-only.
+- Passwords must not include context words from user identity and service context.
+- Breach checks use HaveIBeenPwned k-anonymity (`/range` API), sending only the first 5 SHA-1 hash chars.
+- If the breach API is unavailable, registration/reset does not hard-fail due only to transient API errors.
+
 ## Token System
 
 The authentication system uses multiple token types:

diff --git a/docs/content/contribute/environment-variables.mdx b/docs/content/contribute/environment-variables.mdx
@@ -24,11 +24,16 @@ Create a `.env` file in `apps/backend/` with the following variables:
     ['`DB_ENCRYPTION_KEY`', '**Required**', '32-byte URL-safe base64-encoded encryption key for database field encryption. Generate with: `python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"`. **Never commit to version control**'],
     ['`LOG_LEVEL`', 'Default: `DEBUG`', 'Logging level: `DEBUG`, `INFO`, `WARNING`, `ERROR`'],
     ['`JWT_SECRET_KEY`', '**Required**', 'Secret key for JWT token signing. Generate with: `openssl rand -hex 64`'],
+    ['`SESSION_SECRET_KEY`', '**Required**', 'Secret key used by backend `SessionMiddleware` for session cookie signing. Must be distinct from `JWT_SECRET_KEY` (key separation)'],
     ['`JWT_ALGORITHM`', 'Default: `HS256`', 'JWT signing algorithm'],
     ['`JWT_ACCESS_TOKEN_EXPIRE_MINUTES`', 'Default: `15`', 'JWT access token expiration time in minutes'],
     ['`JWT_REFRESH_TOKEN_EXPIRE_DAYS`', 'Default: `7`', 'Refresh token expiration time in days'],
     ['`AUTH_EMAIL_PASSWORD_ENABLED`', 'Default: `true`', 'Enable email/password authentication'],
     ['`AUTH_REGISTRATION_ENABLED`', 'Default: `true`', 'Enable new user registration via email'],
+    ['`PASSWORD_MIN_LENGTH`', 'Default: `12`', 'Minimum password length for registration and password reset'],
+    ['`PASSWORD_MAX_LENGTH`', 'Default: `128`', 'Maximum password length for registration and password reset'],
+    ['`PASSWORD_MIN_STRENGTH_SCORE`', 'Default: `2`', 'Minimum zxcvbn password strength score (0-4)'],
+    ['`PASSWORD_CHECK_BREACHED`', 'Default: `true`', 'Enable HaveIBeenPwned k-anonymity breach check during password validation'],
     ['`GOOGLE_CLIENT_ID`', 'Optional', 'Google OAuth client ID. Enables Google sign-in when configured'],
     ['`GOOGLE_CLIENT_SECRET`', 'Optional', 'Google OAuth client secret'],
     ['`GH_CLIENT_ID`', 'Optional', 'GitHub OAuth client ID. Enables GitHub sign-in when configured'],

diff --git a/docs/content/docs/test-sets/_meta.tsx b/docs/content/docs/test-sets/_meta.tsx
@@ -2,6 +2,7 @@ import type { MetaRecord } from "nextra";
 
 const meta: MetaRecord = {
   index: "Overview",
+  "adaptive-testing": "Adaptive Testing",
   "import-from-file": "Import from File",
   "import-from-garak": "Import from Garak",
 };

diff --git a/docs/content/docs/test-sets/adaptive-testing.mdx b/docs/content/docs/test-sets/adaptive-testing.mdx
@@ -0,0 +1,113 @@
+import { CodeBlock } from '@/components/CodeBlock'
+
+# Adaptive Testing
+
+Adaptive Testing is a topic-based workflow for expanding and maintaining single-turn test sets over time.
+Instead of treating a test set as static, you can organize tests into topic trees, generate outputs in bulk,
+evaluate results with a selected metric, and iterate with AI-generated suggestions.
+
+## When to Use Adaptive Testing
+
+Use Adaptive Testing when you need to:
+
+- Grow coverage across specific risk areas or product domains
+- Re-evaluate existing tests without regenerating all outputs
+- Curate suggestions before saving them to your test set
+- Keep one evolving test set instead of creating many one-off sets
+
+## End-to-End Workflow
+
+Adaptive Testing in the UI follows this loop:
+
+1. Create an adaptive test set
+2. Organize tests in topics
+3. Generate outputs from a selected endpoint
+4. Evaluate tests with a selected metric
+5. Generate and review suggestions
+6. Accept selected suggestions into the test set
+
+<Callout type="info">
+  Topic operations are hierarchical. Renaming a topic cascades to child topics and tests; removing a topic
+  removes subtopics and moves tests to the parent topic.
+</Callout>
+
+## Generate Outputs and Evaluate with Overwrite Control
+
+Two actions drive most iteration cycles:
+
+- **Generate outputs**: invoke an endpoint for test inputs and store output in test metadata
+- **Evaluate**: run a selected metric against test input/output pairs and store label and score in test metadata
+
+Both actions support an `overwrite` option.
+
+| Parameter | Type | Default | Behavior |
+| --- | --- | --- | --- |
+| `topic` | `string \| null` | `null` | Limits processing to a topic |
+| `include_subtopics` | `boolean` | `true` | Includes descendant topics when `topic` is set |
+| `overwrite` | `boolean` | `false` | Replaces existing outputs/results instead of skipping |
+| `test_ids` | `string[] \| null` | `null` | Optional explicit subset of tests |
+
+<Callout type="info">
+  With `overwrite=false`, tests that already have outputs or evaluation labels are skipped. The response includes
+  `generated` or `evaluated`, plus `skipped` and `failed` counts.
+</Callout>
+
+### Generate Outputs API Example
+
+<CodeBlock filename="generate_outputs.sh" language="bash">
+{`curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_outputs" \\
+  -H "Authorization: Bearer $RHESIS_API_TOKEN" \\
+  -H "Content-Type: application/json" \\
+  -d '{
+    "endpoint_id": "00000000-0000-0000-0000-000000000000",
+    "topic": "Safety/Jailbreak",
+    "include_subtopics": true,
+    "overwrite": false
+  }'`}
+</CodeBlock>
+
+### Evaluate API Example
+
+<CodeBlock filename="evaluate_tests.sh" language="bash">
+{`curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/evaluate" \\
+  -H "Authorization: Bearer $RHESIS_API_TOKEN" \\
+  -H "Content-Type: application/json" \\
+  -d '{
+    "metric_names": ["answer_relevancy"],
+    "topic": "Safety/Jailbreak",
+    "include_subtopics": true,
+    "overwrite": true
+  }'`}
+</CodeBlock>
+
+## Suggestions Workflow
+
+Suggestion endpoints are non-persisted until you accept results in the UI.
+
+1. `POST /adaptive_testing/\{id\}/generate_suggestions`
+2. `POST /adaptive_testing/\{id\}/generate_suggestion_outputs`
+3. `POST /adaptive_testing/\{id\}/evaluate_suggestions`
+4. Accept selected suggestions to create real tests in the set
+
+| Suggestion parameter | Type | Default | Notes |
+| --- | --- | --- | --- |
+| `num_examples` | `int` | `10` | Existing tests sampled as examples |
+| `num_suggestions` | `int` | `20` | Requested number of suggestions |
+| `topic` | `string \| null` | `null` | Optional topic focus |
+
+<CodeBlock filename="generate_suggestions.sh" language="bash">
+{`curl -X POST "$RHESIS_BASE_URL/adaptive_testing/$TEST_SET_ID/generate_suggestions" \\
+  -H "Authorization: Bearer $RHESIS_API_TOKEN" \\
+  -H "Content-Type: application/json" \\
+  -d '{
+    "topic": "Safety/Jailbreak",
+    "num_examples": 10,
+    "num_suggestions": 20
+  }'`}
+</CodeBlock>
+
+## Related Pages
+
+- [Test Sets Overview](/docs/test-sets)
+- [Tests](/docs/tests)
+- [Metrics](/docs/metrics)
diff --git a/docs/content/sdk/metrics/conversational.mdx b/docs/content/sdk/metrics/conversational.mdx
@@ -105,6 +105,64 @@ print(conversation.get_assistant_tool_calls())`}
   These fields are optional. If they are omitted for a turn, helper methods return `None` for that position.
 </Callout>
 
+### Formatting a conversation transcript (v0.6.10+)
+
+Use `ConversationHistory.format_conversation()` when you want a structured, numbered transcript that keeps
+assistant `context`, `metadata`, and `tool_calls` attached to the correct turn.
+
+This is especially useful for custom conversational judges and prompt templates.
+
+<CodeBlock filename="format_conversation.py" language="python">
+{`from rhesis.sdk.metrics import ConversationHistory
+
+conversation = ConversationHistory.from_messages([
+    {"role": "user", "content": "Find policy details for claim #A-123."},
+    {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": [{"id": "call_1", "function": {"name": "lookup_claim"}}],
+        "metadata": {"latency_ms": 420, "model": "rhesis-default"},
+        "context": [{"source": "claims_db", "id": "A-123"}],
+    },
+    {"role": "assistant", "content": "I found the claim and policy summary."},
+])
+
+print(conversation.format_conversation())`}
+</CodeBlock>
+
+Expected output shape:
+
+<CodeBlock filename="formatted_output.txt" language="text">
+{`Turn 1:
+  User: Find policy details for claim #A-123.
+  Context: [
+    {
+      "source": "claims_db",
+      "id": "A-123"
+    }
+  ]
+  Metadata: {
+    "latency_ms": 420,
+    "model": "rhesis-default"
+  }
+  Tool Calls: [
+    {
+      "id": "call_1",
+      "function": {
+        "name": "lookup_claim"
+      }
+    }
+  ]
+
+Turn 2:
+  Assistant: I found the claim and policy summary.`}
+</CodeBlock>
+
+<Callout type="info">
+  `to_text()` returns a simpler role-prefixed transcript and excludes `metadata`, `context`, and `tool_calls`.
+  Use `format_conversation()` when those fields must be visible to the evaluating model.
+</Callout>
+
 ## Quick Start
 
 ### Turn Relevancy