Skip to content

Implement factual correctness evaluation for RAG workflows #1

@Enniwhere

Description

@Enniwhere

Description:
We want to be able to evaluate the factual correctness of lex.llm workflows with lex.eval

Acceptance criteria:

  • Possible to run factual correctness evaluation based on data from lex.db for a particular workflow

Technical details:
Details on factual correctness can be found here: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/factual_correctness/#factual-correctness
We will likely not be able to use RAGAS directly - but we would probably want to use Pydantic Evals https://ai.pydantic.dev/evals/#pydantic-evals-package
It's likely easiest to set up the system with a CLI to begin with.

Design:
Optional details on design for context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions