Skip to content

Commit

Permalink
more doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
NikAiyer committed Feb 15, 2025
1 parent 7e4cc66 commit 1eaaa44
Show file tree
Hide file tree
Showing 8 changed files with 21 additions and 34 deletions.
3 changes: 1 addition & 2 deletions docs/src/pages/docs/reference/evals/context-position.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,8 @@ console.log(result.info.reason); // Explanation of the score

The metric evaluates context positioning through:
- Individual assessment of each context piece's relevance
- Position-based weighting (1/position)
- Applies position weights (earlier positions weighted more heavily)
- Binary relevance verdicts (yes/no) with detailed reasoning
- Normalization against optimal ordering

### Scoring Process

Expand Down
4 changes: 1 addition & 3 deletions docs/src/pages/docs/reference/evals/context-precision.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,7 @@ The metric evaluates context precision through:

### Scoring Process

1. Converts verdicts to binary scores:
- Relevant context: 1
- Irrelevant context: 0
1. Converts verdicts to binary scores (1 for relevant, 0 for not)

2. Calculates Mean Average Precision:
- Computes precision at each position
Expand Down
11 changes: 3 additions & 8 deletions docs/src/pages/docs/reference/evals/faithfulness.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,9 @@ console.log(result.info.reason); // "All claims are supported by the context."
## Scoring Details

The metric evaluates faithfulness through:
- Claim extraction and verification
- Context-based validation
- Verdict assignment
- Support ratio calculation
- Extracting all claims from the output (both factual and speculative)
- Verifying each claim against the provided context
- Calculating a score based on the proportion of supported claims

### Scoring Process

Expand All @@ -155,10 +154,6 @@ Final score: `supported_claims / total_claims * scale`
- 0.4-0.6: Mixed support with some contradictions
- 0.1-0.3: Limited support, many contradictions
- 0.0: No supported claims
- 0.67: Two-thirds of claims are supported
- 0.5: Half of the claims are supported
- 0.33: One-third of claims are supported
- 0: No claims are supported or the output is empty

## Advanced Example

Expand Down
8 changes: 4 additions & 4 deletions docs/src/pages/docs/reference/evals/hallucination.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -176,11 +176,11 @@ Final score: `contradicted_statements / total_statements`

### Score interpretation
(0 to scale, default 0-1)
- 0.0: No hallucination - output aligns with all context statements
- 0.25: Low hallucination - contradicts 25% of context statements
- 0.5: Moderate hallucination - contradicts half of context statements
- 0.75: High hallucination - contradicts 75% of context statements
- 1.0: Complete hallucination - contradicts all context statements
- 0.75: High hallucination - contradicts 75% of context statements
- 0.5: Moderate hallucination - contradicts half of context statements
- 0.25: Low hallucination - contradicts 25% of context statements
- 0.0: No hallucination - output aligns with all context statements

**Note:** The score represents the degree of hallucination - lower scores indicate better factual alignment with the provided context

Expand Down
3 changes: 1 addition & 2 deletions docs/src/pages/docs/reference/evals/keyword-coverage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,7 @@ The metric evaluates keyword coverage by matching keywords with the following fe
- Common word and stop word filtering (e.g., "the", "a", "and")
- Case-insensitive matching
- Word form variation handling
- Special handling of technical terms
- Compound word processing
- Special handling of technical terms and compound words

### Scoring Process

Expand Down
11 changes: 4 additions & 7 deletions docs/src/pages/docs/reference/evals/summarization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -133,18 +133,15 @@ console.log(result.info); // Object containing detailed metrics about the summar
## Scoring Details

The metric evaluates summaries through two essential components:

1. **Alignment Score**: Measures factual correctness
- Extracts and verifies claims
- Validates against source text
- Uses yes/no/unsure verdicts
- Ensures factual accuracy
- Extracts claims from the summary
- Verifies each claim against the original text
- Assigns "yes", "no", or "unsure" verdicts

2. **Coverage Score**: Measures inclusion of key information
- Generates key questions from the original text
- Check if the summary answers these questions
- Checks information inclusion
- Assesses comprehensiveness
- Checks information inclusion and assesses comprehensiveness

### Scoring Process

Expand Down
4 changes: 2 additions & 2 deletions docs/src/pages/docs/reference/evals/textual-difference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ The metric calculates several measures:
### Scoring Process

1. Analyzes textual differences:
- Performs sequence matching
- Counts change operations
- Performs sequence matching between input and output
- Counts the number of change operations required
- Measures length differences

2. Calculates final metrics:
Expand Down
11 changes: 5 additions & 6 deletions docs/src/pages/docs/reference/evals/tone-consistency.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,16 +111,15 @@ console.log(result2.score); // Tone stability score from 0-1
The metric operates in two modes:

1. **Tone Consistency** (with reference):
When both input and output are provided:
- Compares sentiment between texts
- Calculates sentiment difference
- Evaluates tone matching
- Higher score = more consistent tone
- Evaluates tone matching: a higher score indicates more consistent tone

2. **Tone Stability** (single input):
- Analyzes cross-sentence sentiment
- Measures sentiment variance
- Assesses tone stability
- Higher score = more stable tone
- Analyzes sentiment stability across sentences
- Calculates variance in sentiment
- Assesses tone stability: a higher score indicates more stable tone

### Scoring Process

Expand Down

0 comments on commit 1eaaa44

Please sign in to comment.