diff --git a/docs/src/pages/docs/reference/evals/context-position.mdx b/docs/src/pages/docs/reference/evals/context-position.mdx
index 1545911a5..ac721f7cd 100644
--- a/docs/src/pages/docs/reference/evals/context-position.mdx
+++ b/docs/src/pages/docs/reference/evals/context-position.mdx
@@ -125,9 +125,8 @@ console.log(result.info.reason); // Explanation of the score
 
 The metric evaluates context positioning through:
 - Individual assessment of each context piece's relevance
-- Position-based weighting (1/position)
+- Applies position weights (earlier positions weighted more heavily)
 - Binary relevance verdicts (yes/no) with detailed reasoning
-- Normalization against optimal ordering
 
 ### Scoring Process
 
diff --git a/docs/src/pages/docs/reference/evals/context-precision.mdx b/docs/src/pages/docs/reference/evals/context-precision.mdx
index 617220537..2f454742a 100644
--- a/docs/src/pages/docs/reference/evals/context-precision.mdx
+++ b/docs/src/pages/docs/reference/evals/context-precision.mdx
@@ -131,9 +131,7 @@ The metric evaluates context precision through:
 
 ### Scoring Process
 
-1. Converts verdicts to binary scores:
-   - Relevant context: 1
-   - Irrelevant context: 0
+1. Converts verdicts to binary scores (1 for relevant, 0 for not)
 
 2. Calculates Mean Average Precision:
    - Computes precision at each position
diff --git a/docs/src/pages/docs/reference/evals/faithfulness.mdx b/docs/src/pages/docs/reference/evals/faithfulness.mdx
index 2a8f2bc1d..f88a89d0e 100644
--- a/docs/src/pages/docs/reference/evals/faithfulness.mdx
+++ b/docs/src/pages/docs/reference/evals/faithfulness.mdx
@@ -126,10 +126,9 @@ console.log(result.info.reason); // "All claims are supported by the context."
 ## Scoring Details
 
 The metric evaluates faithfulness through:
-- Claim extraction and verification
-- Context-based validation
-- Verdict assignment
-- Support ratio calculation
+- Extracting all claims from the output (both factual and speculative)
+- Verifying each claim against the provided context
+- Calculating a score based on the proportion of supported claims
 
 ### Scoring Process
 
@@ -155,10 +154,6 @@ Final score: `supported_claims / total_claims * scale`
 - 0.4-0.6: Mixed support with some contradictions
 - 0.1-0.3: Limited support, many contradictions
 - 0.0: No supported claims
-- 0.67: Two-thirds of claims are supported
-- 0.5: Half of the claims are supported
-- 0.33: One-third of claims are supported
-- 0: No claims are supported or the output is empty
 
 ## Advanced Example
 
diff --git a/docs/src/pages/docs/reference/evals/hallucination.mdx b/docs/src/pages/docs/reference/evals/hallucination.mdx
index 6d84f9d12..b1398003f 100644
--- a/docs/src/pages/docs/reference/evals/hallucination.mdx
+++ b/docs/src/pages/docs/reference/evals/hallucination.mdx
@@ -176,11 +176,11 @@ Final score: `contradicted_statements / total_statements`
 
 ### Score interpretation
 (0 to scale, default 0-1)
-- 0.0: No hallucination - output aligns with all context statements
-- 0.25: Low hallucination - contradicts 25% of context statements
-- 0.5: Moderate hallucination - contradicts half of context statements
-- 0.75: High hallucination - contradicts 75% of context statements
 - 1.0: Complete hallucination - contradicts all context statements
+- 0.75: High hallucination - contradicts 75% of context statements
+- 0.5: Moderate hallucination - contradicts half of context statements
+- 0.25: Low hallucination - contradicts 25% of context statements
+- 0.0: No hallucination - output aligns with all context statements
 
 **Note:** The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context
 
diff --git a/docs/src/pages/docs/reference/evals/keyword-coverage.mdx b/docs/src/pages/docs/reference/evals/keyword-coverage.mdx
index 2e2bf3aac..112415a1b 100644
--- a/docs/src/pages/docs/reference/evals/keyword-coverage.mdx
+++ b/docs/src/pages/docs/reference/evals/keyword-coverage.mdx
@@ -87,8 +87,7 @@ The metric evaluates keyword coverage by matching keywords with the following fe
 - Common word and stop word filtering (e.g., "the", "a", "and")
 - Case-insensitive matching
 - Word form variation handling
-- Special handling of technical terms
-- Compound word processing
+- Special handling of technical terms and compound words
 
 ### Scoring Process
 
diff --git a/docs/src/pages/docs/reference/evals/summarization.mdx b/docs/src/pages/docs/reference/evals/summarization.mdx
index 9623c17b1..37ecb3b58 100644
--- a/docs/src/pages/docs/reference/evals/summarization.mdx
+++ b/docs/src/pages/docs/reference/evals/summarization.mdx
@@ -133,18 +133,15 @@ console.log(result.info); // Object containing detailed metrics about the summar
 ## Scoring Details
 
 The metric evaluates summaries through two essential components:
-
 1. **Alignment Score**: Measures factual correctness
-   - Extracts and verifies claims
-   - Validates against source text
-   - Uses yes/no/unsure verdicts
-   - Ensures factual accuracy
+   - Extracts claims from the summary
+   - Verifies each claim against the original text
+   - Assigns "yes", "no", or "unsure" verdicts
 
 2. **Coverage Score**: Measures inclusion of key information
    - Generates key questions from the original text
    - Check if the summary answers these questions
-   - Checks information inclusion
-   - Assesses comprehensiveness
+   - Checks information inclusion and assesses comprehensiveness
 
 ### Scoring Process
 
diff --git a/docs/src/pages/docs/reference/evals/textual-difference.mdx b/docs/src/pages/docs/reference/evals/textual-difference.mdx
index 3c6298b0a..5e3739ea9 100644
--- a/docs/src/pages/docs/reference/evals/textual-difference.mdx
+++ b/docs/src/pages/docs/reference/evals/textual-difference.mdx
@@ -111,8 +111,8 @@ The metric calculates several measures:
 ### Scoring Process
 
 1. Analyzes textual differences:
-   - Performs sequence matching
-   - Counts change operations
+   - Performs sequence matching between input and output
+   - Counts the number of change operations required
    - Measures length differences
 
 2. Calculates final metrics:
diff --git a/docs/src/pages/docs/reference/evals/tone-consistency.mdx b/docs/src/pages/docs/reference/evals/tone-consistency.mdx
index d4d9ec5de..5700c6de8 100644
--- a/docs/src/pages/docs/reference/evals/tone-consistency.mdx
+++ b/docs/src/pages/docs/reference/evals/tone-consistency.mdx
@@ -111,16 +111,15 @@ console.log(result2.score); // Tone stability score from 0-1
 The metric operates in two modes:
 
 1. **Tone Consistency** (with reference):
+When both input and output are provided:
    - Compares sentiment between texts
    - Calculates sentiment difference
-   - Evaluates tone matching
-   - Higher score = more consistent tone
+   - Evaluates tone matching: a higher score indicates more consistent tone
 
 2. **Tone Stability** (single input):
-   - Analyzes cross-sentence sentiment
-   - Measures sentiment variance
-   - Assesses tone stability
-   - Higher score = more stable tone
+  - Analyzes sentiment stability across sentences
+   - Calculates variance in sentiment
+   - Assesses tone stability: a higher score indicates more stable tone
 
 ### Scoring Process