Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Changelog

### v0.5.0 (January 29, 2026)
- Added Example and Vocab Injectors for Deep Research Rubric (PR #13)
- Updated documentations (PR #13)
- Minor refactoring (#11)

### v0.4.0 (January 13, 2026)
- Add a GPT custom jude (PR #5)
- Update documentation
Expand Down
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,11 @@ Judges within YESciEval are defined as follows:
A total of **23** evaluation rubrics were defined as part of the YESciEval test framework and can be used via ``yescieval``. Following simple example shows how to import rubrics in your code:

```python
from yescieval import Informativeness, Correctness, Completeness, Coherence, Relevancy, \
Integration, Cohesion, Readability, Conciseness, GeographicCoverage, \
InterventionDiversity, BiodiversityDimensions, EcosystemServices, SpatialScale, \
MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification, \
StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment, \
SpeculativeStatements, NoveltyIndicators
from yescieval import Informativeness, Correctness, Completeness, Coherence, Relevancy,\
Integration, Cohesion, Readability, Conciseness,\
MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification,\
StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment,\
SpeculativeStatements, NoveltyIndicators
```

A complete list of rubrics are available at YESciEval [📚 Rubrics](https://yescieval.readthedocs.io/rubrics.html) page.
Expand Down
53 changes: 18 additions & 35 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
Quickstart
=================

YESciEval is a library designed to evaluate the quality of synthesized scientific answers using predefined rubrics and advanced LLM-based judgment models. This guide walks you through how to evaluate answers based on **informativeness** and **gap identification** using a pretrained & a custom judge and parse LLM output into structured JSON.
YESciEval is a library designed to evaluate the quality of synthesized scientific answers using predefined rubrics and advanced LLM-based judgment models. This guide walks you through how to evaluate answers based given rubrics (i.e. **informativeness**) using a pretrained or a custom judge and parse LLM output into structured JSON.


**Example: Evaluating an Answer Using Informativeness + AskAutoJudge**
The following example shows the how to run a ``AskAutoJudge`` on ``Informativeness`` rubric:


.. code-block:: python

Expand Down Expand Up @@ -46,29 +47,7 @@ YESciEval is a library designed to evaluate the quality of synthesized scientifi
- Use the ``device="cuda"`` if running on GPU for better performance.
- Add more rubrics such as ``Informativeness``, ``Relevancy``, etc for multi-criteria evaluation.


**Example: Evaluating an Answer Using GapIdentification + CustomAutoJudge**

.. code-block:: python

from yescieval import GapIdentification, CustomAutoJudge

# Step 1: Create a rubric
rubric = GapIdentification(papers=papers, question=question, answer=answer)
instruction_prompt = rubric.instruct()

# Step 2: Load the evaluation model (judge)
judge = CustomAutoJudge()
judge.from_pretrained(model_id="Qwen/Qwen3-8B", device="cpu", token="your_huggingface_token")

# Step 3: Evaluate the answer
result = judge.judge(rubric=rubric)
print("Raw Evaluation Output:")
print(result)

**Parsing Raw Output with GPTParser**

If the model outputs unstructured or loosely structured text, you can use GPTParser to parse it into valid JSON.
**Output Parser**: If the model outputs unstructured or loosely structured text, you can use GPTParser to parse it into valid JSON.

.. code-block:: python

Expand All @@ -83,7 +62,7 @@ If the model outputs unstructured or loosely structured text, you can use GPTPar
print("Parsed Output:")
print(parsed.model_dump())

**Expected Output Format**
Expected outputfFormat is:

.. code-block:: json

Expand Down Expand Up @@ -116,16 +95,20 @@ The output schema is as a following (if you do not prefer to use ``.model_dump()
'type': 'object'
}


.. hint:: Key Components

+------------------+-------------------------------------------------------+
| Component | Purpose |
+==================+=======================================================+
| Informativeness | Defines rubric to evaluate relevance to source papers |
+------------------+-------------------------------------------------------+
| AskAutoJudge | Loads and uses a judgment model to evaluate answers |
+------------------+-------------------------------------------------------+
| GPTParser | Parses loosely formatted text from LLMs into JSON |
+------------------+-------------------------------------------------------+

.. list-table::
:header-rows: 1
:widths: 30 60

* - Component
- Purpose
* - **Informativeness**
- Defines rubric to evaluate relevance to source papers
* - **AskAutoJudge**
- Loads and uses a judgment model to evaluate answers
* - **GPTParser**
- Parses loosely formatted text from LLMs into JSON

Loading