diff --git a/CHANGELOG.md b/CHANGELOG.md
index 262c769..1bb7366 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,11 @@
 ## Changelog
 
+### v0.4.0 (January 13, 2026)
+- Add a GPT custom jude (PR #5)
+- Update documentation 
+- Minor bug fixing in deep research rubrics and judges
+- Update Readme
+
 ### v0.3.0 (December 20, 2025)
 - Add more rubrics (PR #3)
 - Update documentation for new rubrics
diff --git a/README.md b/README.md
index faba342..f00a219 100644
--- a/README.md
+++ b/README.md
@@ -75,7 +75,7 @@ judge.from_pretrained(
 )
 
 # Step 3: Evaluate the answer
-result = judge.evaluate(rubric=rubric)
+result = judge.judge(rubric=rubric)
 print("Raw Evaluation Output:")
 print(result)
 ```
@@ -87,7 +87,9 @@ Judges within YESciEval are defined as follows:
 | `AutoJudge`      | Base class for loading and running evaluation models with PEFT adapters.                     |
 | `AskAutoJudge`   | Multidisciplinary judge tuned on the ORKGSyn dataset from the Open Research Knowledge Graph. |
 | `BioASQAutoJudge` | Biomedical domain judge tuned on the BioASQ dataset from the BioASQ challenge.               |
-| `CustomAutoJudge`| Custom LLM (open-source LLMs) that can be used as a judge within YESciEval rubrics           |
+| `CustomAutoJudge`| Custom LLM that can be used as a judge within YESciEval rubrics                              |
+| `GPTCustomAutoJudge`| Custom GPT-based LLM that can be used as a judge within YESciEval                     |
+
 
 A total of **23** evaluation rubrics were defined as part of the YESciEval test framework and can be used via ``yescieval``. Following simple example shows how to import rubrics in your code:
 
@@ -96,9 +98,8 @@ from yescieval import Informativeness, Correctness, Completeness, Coherence, Rel
                       Integration, Cohesion, Readability, Conciseness, GeographicCoverage, \ 
                       InterventionDiversity, BiodiversityDimensions, EcosystemServices, SpatialScale, \
                       MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification, \
-                      StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment, \ 
-                      SpeculativeStatements, NoveltyIndicators
-
+                      StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment, \
+                      SpeculativeStatements, NoveltyIndicators 
 ```
 
 A complete list of rubrics are available at YESciEval [📚 Rubrics](https://yescieval.readthedocs.io/rubrics.html) page.
diff --git a/docs/source/judges.rst b/docs/source/judges.rst
index 0eb37af..14621a3 100644
--- a/docs/source/judges.rst
+++ b/docs/source/judges.rst
@@ -48,7 +48,7 @@ The following example demonstrates how to create an evaluation rubric, load a ju
                           device="cpu")
 
     # Step 3: Evaluate the answer
-    result = judge.evaluate(rubric=rubric)
+    result = judge.judge(rubric=rubric)
 
     print("Raw Evaluation Output:")
     print(result)
@@ -84,8 +84,37 @@ For example, you can load a model and evaluate a rubric like this:
     judge.from_pretrained(model_id="Qwen/Qwen3-8B", device="cpu", token="your_huggingface_token")
 
     # Evaluate the rubric using the loaded model
-    result = judge.evaluate(rubric=rubric)
+    result = judge.judge(rubric=rubric)
 
     print(result)
 
 This approach allows full control over which model is used for evaluation, supporting any LLM..
+
+GPT Custom Judge
+--------------------
+
+The `GPTCustomAutoJudge` class provides a generic, flexible interface to evaluate scientific syntheses using OpenAI GPT models. 
+
+You can use it to evaluate a rubric by providing your OpenAI API key and specifying the model ID:
+
+.. code-block:: python
+
+    # Initialize and load a custom model by specifying its Hugging Face model ID
+    judge = GPTCustomAutoJudge()
+    judge.from_pretrained(model_id="gpt-5.2", token=OPEN_AI_API_KEY)
+
+    # Evaluate the rubric using the loaded model
+    result = judge.judge(rubric=rubric)
+
+    print(result.model_dump())
+
+as a result output will be in the following format
+
+.. code-block:: json
+
+   {
+     "rating": rating-value,
+     "rationale": "rationale-text"
+   }
+
+This allows you to leverage the capabilities of OpenAI's GPT models for scientific text evaluation.
\ No newline at end of file
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
index fdbb3e3..8f7fbdc 100644
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@@ -35,7 +35,7 @@ YESciEval is a library designed to evaluate the quality of synthesized scientifi
    judge.from_pretrained(token="your_huggingface_token", device="cpu")
 
    # Step 3: Evaluate the answer
-   result = judge.evaluate(rubric=rubric)
+   result = judge.judge(rubric=rubric)
 
    print("Raw Evaluation Output:")
    print(result)
@@ -62,7 +62,7 @@ YESciEval is a library designed to evaluate the quality of synthesized scientifi
    judge.from_pretrained(model_id="Qwen/Qwen3-8B", device="cpu", token="your_huggingface_token")
 
    # Step 3: Evaluate the answer
-   result = judge.evaluate(rubric=rubric)
+   result = judge.judge(rubric=rubric)
    print("Raw Evaluation Output:")
    print(result)
 
@@ -81,7 +81,7 @@ If the model outputs unstructured or loosely structured text, you can use GPTPar
    parsed = parser.parse(raw_output=raw_output)
 
    print("Parsed Output:")
-   print(parsed)
+   print(parsed.model_dump())
 
 **Expected Output Format**
 
@@ -92,6 +92,30 @@ If the model outputs unstructured or loosely structured text, you can use GPTPar
      "rationale": "The answer covers key aspects of how AI is applied in healthcare, such as diagnostics and personalized medicine."
    }
 
+The output schema is as a following (if you do not prefer to use ``.model_dump()``) to be able to use like ``result.rating`` to access the rating value or ``result.rationale`` to access the textual explanation for rating.
+
+.. code-block::
+
+	{
+		'properties': {
+			'rating': {
+				'description': 'Rating from 1 to 5',
+				'maximum': 5,
+				'minimum': 1,
+				'title': 'Rating',
+				'type': 'integer'
+			},
+			'rationale': {
+				'description': 'Textual explanation for the rating',
+				'title': 'Rationale',
+				'type': 'string'
+			}
+		},
+		'required': ['rating', 'rationale'],
+		'title': 'RubricLikertScale',
+		'type': 'object'
+	}
+
 .. hint:: Key Components
 
     +------------------+-------------------------------------------------------+
diff --git a/docs/source/rubrics.rst b/docs/source/rubrics.rst
index e264f21..b38498c 100644
--- a/docs/source/rubrics.rst
+++ b/docs/source/rubrics.rst
@@ -188,3 +188,4 @@ And to use rubrics:
     instruction = rubric.instruct()
 
     print(instruction)
+    print(rubric.name)
diff --git a/yescieval/VERSION b/yescieval/VERSION
index 9325c3c..60a2d3e 100644
--- a/yescieval/VERSION
+++ b/yescieval/VERSION
@@ -1 +1 @@
-0.3.0
\ No newline at end of file
+0.4.0
\ No newline at end of file
diff --git a/yescieval/__init__.py b/yescieval/__init__.py
index f8a37bb..28c7095 100644
--- a/yescieval/__init__.py
+++ b/yescieval/__init__.py
@@ -9,6 +9,6 @@
                     MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification, 
                     StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment, 
                     SpeculativeStatements, NoveltyIndicators)
-from .judge import AutoJudge, AskAutoJudge, BioASQAutoJudge, CustomAutoJudge
+from .judge import AutoJudge, AskAutoJudge, BioASQAutoJudge, CustomAutoJudge, GPTCustomAutoJudge
 from .parser import GPTParser
 
diff --git a/yescieval/base/judge.py b/yescieval/base/judge.py
index 5ef75ed..3178d2b 100644
--- a/yescieval/base/judge.py
+++ b/yescieval/base/judge.py
@@ -1,6 +1,6 @@
 from abc import ABC
 from typing import Dict, Any
-from . import Parser, Rubric
+from . import Rubric, RubricLikertScale
 
 
 class Judge(ABC):
@@ -8,7 +8,7 @@ class Judge(ABC):
     def from_pretrained(self, model_id:str, device: str="auto", token:str =""):
         self.model, self.tokenizer = self._from_pretrained(model_id=model_id, device=device, token=token)
 
-    def judge(self, rubric: Rubric, max_new_tokens: int=150) -> Dict[str, Dict[str, str]]:
+    def judge(self, rubric: Rubric, max_new_tokens: int=150) -> Dict[str, Dict[str, str]] | str | RubricLikertScale:
         pass
 
     def _from_pretrained(self, model_id: str, device: str = "auto", token: str = "") -> [Any, Any]:
diff --git a/yescieval/base/rubric.py b/yescieval/base/rubric.py
index 64c37e7..6ca06fe 100644
--- a/yescieval/base/rubric.py
+++ b/yescieval/base/rubric.py
@@ -10,6 +10,7 @@ class Rubric(BaseModel, ABC):
     Subclasses must implement `verbalize`.
     """
     system_prompt_template: str
+    name: str = "Rubric"
     papers: Dict[str, str]
     question: str
     answer: str
diff --git a/yescieval/judge/__init__.py b/yescieval/judge/__init__.py
index a3fe787..d0d69e3 100644
--- a/yescieval/judge/__init__.py
+++ b/yescieval/judge/__init__.py
@@ -1,8 +1,10 @@
-from .judges import AutoJudge, AskAutoJudge, BioASQAutoJudge, CustomAutoJudge
+from .judges import AutoJudge, AskAutoJudge, BioASQAutoJudge
+from .custom import CustomAutoJudge, GPTCustomAutoJudge
 
 __all__ = [
     "AutoJudge",
     "AskAutoJudge",
     "BioASQAutoJudge",
-    "CustomAutoJudge"
+    "CustomAutoJudge",
+    "GPTCustomAutoJudge"
 ]
\ No newline at end of file
diff --git a/yescieval/judge/custom.py b/yescieval/judge/custom.py
new file mode 100644
index 0000000..c44d1cf
--- /dev/null
+++ b/yescieval/judge/custom.py
@@ -0,0 +1,97 @@
+from ..base import Judge, Rubric, RubricLikertScale
+from .judges import AutoJudge
+
+import time
+from typing import Dict, List
+from openai import OpenAI
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+import logging
+
+logger = logging.getLogger(__name__)
+
+class CustomAutoJudge(AutoJudge):
+
+    def _from_pretrained(self, model_id:str, device:str="auto", token:str =""):
+        tokenizer = AutoTokenizer.from_pretrained(model_id,
+                                                  padding_side="left",
+                                                  token=token)
+        tokenizer.pad_token = tokenizer.eos_token
+        model = AutoModelForCausalLM.from_pretrained(
+            model_id,
+            torch_dtype=torch.float32,
+            device_map=device,
+            token=token
+        )
+        return model, tokenizer
+
+
+class GPTCustomAutoJudge(Judge):
+
+    def from_pretrained(self, model_id: str, device: str = "auto", token: str = ""):
+        if not token:
+            raise ValueError("OpenAI API token must be provided.")
+        self.model_name = model_id
+        self.client = OpenAI(api_key=token)
+
+    def _supports_function_calling(self) -> bool:
+        gpt_4_prefixes = (
+            "gpt-4", # gpt4 family including gpt-4o, gpt-4o-mini, gpt-4.1, ...
+            "GPT-3.5", # gpt-3.5 family
+        )
+        return any(self.model_name.startswith(prefix) for prefix in gpt_4_prefixes)
+
+    def _output_schema(self) -> List[Dict]:
+        return [
+            {
+                "name": "response_format",
+                "description": f"Return the `rating` and `rationale` only as a response.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        'rating': {
+                                "type": "number",
+                                "description": "A numerical rating assigned to the characteristic.",
+                                "minimum": 1,
+                                "maximum": 5
+                        },
+                        "rationale": {
+                            "type": "string",
+                            "description": "The explanation for the assigned rating."
+                        },
+                    },
+                    "required": ["rating", "rationale"]
+                }
+            }
+        ]
+
+    def judge(self, rubric: Rubric, max_new_tokens: int = 150) -> RubricLikertScale:
+        if not self.client:
+            raise ValueError("Model not initialized.")
+        messages = rubric.instruct()
+        params = {
+            "model": self.model_name,
+            "messages": messages
+        }
+        if self._supports_function_calling():
+            params["functions"] = self._output_schema()
+
+        try_counter = 0
+        while True:
+            try:
+                try_counter += 1
+                response = self.client.chat.completions.create(**params)
+                message = response.choices[0].message
+                if self._supports_function_calling():
+                    parsed_output = eval(message.function_call.arguments)
+                else:
+                    parsed_output = eval(message.content)[rubric.name]
+                evaluation = RubricLikertScale(rating=parsed_output['rating'], rationale=parsed_output['rationale'])
+                return evaluation
+
+            except Exception as e:
+                logger.error(f"{try_counter} times failed attempt!")
+                logger.warning(f"API call failed, retrying in 4 seconds: {e}")
+                time.sleep(5)
+
+
diff --git a/yescieval/judge/judges.py b/yescieval/judge/judges.py
index 00c736d..c700b28 100644
--- a/yescieval/judge/judges.py
+++ b/yescieval/judge/judges.py
@@ -4,7 +4,9 @@
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from peft import PeftModel, PeftConfig
 import torch
+import logging
 
+logger = logging.getLogger(__name__)
 
 
 class AutoJudge(Judge):
@@ -25,7 +27,7 @@ def _from_pretrained(self, model_id:str, device:str="auto", token:str =""):
         model = PeftModel.from_pretrained(base_model, model_id)
         return model, tokenizer
 
-    def evaluate(self, rubric: Rubric, max_new_tokens: int=150) -> Dict[str, Dict[str, str]]:
+    def judge(self, rubric: Rubric, max_new_tokens: int=150) -> str:
         inputs = self.tokenizer.apply_chat_template(rubric.instruct(),
                                                     add_generation_prompt=True,
                                                     return_dict=True,
@@ -49,20 +51,3 @@ def from_pretrained(self, model_id: str = "SciKnowOrg/YESciEval-BioASQ-Llama-3.1
                          device: str = "auto",
                          token: str = ""):
         self.model, self.tokenizer = super()._from_pretrained(model_id=model_id, device=device, token=token)
-
-
-
-class CustomAutoJudge(AutoJudge):
-
-    def _from_pretrained(self, model_id:str, device:str="auto", token:str =""):
-        tokenizer = AutoTokenizer.from_pretrained(model_id,
-                                                  padding_side="left",
-                                                  token=token)
-        tokenizer.pad_token = tokenizer.eos_token
-        model = AutoModelForCausalLM.from_pretrained(
-            model_id,
-            torch_dtype=torch.float32,
-            device_map=device,
-            token=token
-        )
-        return model, tokenizer
diff --git a/yescieval/rubric/breadth.py b/yescieval/rubric/breadth.py
index efaf0bc..dfcc99e 100644
--- a/yescieval/rubric/breadth.py
+++ b/yescieval/rubric/breadth.py
@@ -22,7 +22,7 @@
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. geographic_coverage: is the information in the answer a correct representation of the spatial scope of the provided abstracts?
+1. Geographic Coverage: is the information in the answer a correct representation of the spatial scope of the provided abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -42,7 +42,7 @@
 
 <Example-Response>
 {
-  "geographic_coverage": {"rating": "4", "rationale": "The synthesis accurately represents multiple regions and scales from the provided abstracts, with only minor omissions or irrelevant details."}
+  "Geographic Coverage": {"rating": "4", "rationale": "The synthesis accurately represents multiple regions and scales from the provided abstracts, with only minor omissions or irrelevant details."}
 }
 </Example-Response>
 </Response-Format>
@@ -51,6 +51,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class GeographicCoverage(Rubric):
+    name: str = "Geographic Coverage"
     system_prompt_template: str = geographic_coverage_prompt
 
 intervention_diversity_prompt = """<Context> 
@@ -75,7 +76,7 @@ class GeographicCoverage(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. intervention_diversity: is the answer a comprehensive encapsulation of the relevant information in the provided abstracts, measured by the number of unique management practices?
+1. Intervention Diversity: is the answer a comprehensive encapsulation of the relevant information in the provided abstracts, measured by the number of unique management practices?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -95,7 +96,7 @@ class GeographicCoverage(Rubric):
 
 <Example-Response>
 {
-  "intervention_diversity": {"rating": "4", "rationale": "The answer includes almost all relevant interventions from the provided abstracts, with only minor details missing."}
+  "Intervention Diversity": {"rating": "4", "rationale": "The answer includes almost all relevant interventions from the provided abstracts, with only minor details missing."}
 }
 </Example-Response>
 </Response-Format>
@@ -104,6 +105,7 @@ class GeographicCoverage(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class InterventionDiversity(Rubric):
+    name: str = "Intervention Diversity"
     system_prompt_template: str = intervention_diversity_prompt
 
 biodiversity_dimensions_prompt = """<Context> 
@@ -128,7 +130,7 @@ class InterventionDiversity(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. biodiversity_dimensions: is the answer a comprehensive representation of the relevant biodiversity information in the provided abstracts, measured by the presence of terms related to taxonomic, functional, phylogenetic, and spatial diversity?
+1. Biodiversity Dimensions: is the answer a comprehensive representation of the relevant biodiversity information in the provided abstracts, measured by the presence of terms related to taxonomic, functional, phylogenetic, and spatial diversity?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -148,7 +150,7 @@ class InterventionDiversity(Rubric):
 
 <Example-Response>
 {
-  "biodiversity_dimensions": {"rating": "4", "rationale": "Most information is informative for the research question, capturing the key biodiversity dimensions with minor omissions."}
+  "Biodiversity Dimensions": {"rating": "4", "rationale": "Most information is informative for the research question, capturing the key biodiversity dimensions with minor omissions."}
 }
 </Example-Response>
 </Response-Format>
@@ -157,6 +159,7 @@ class InterventionDiversity(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class BiodiversityDimensions(Rubric):
+    name: str = "Biodiversity Dimensions"
     system_prompt_template: str = biodiversity_dimensions_prompt
 
 ecosystem_services_prompt = """<Context> 
@@ -181,7 +184,7 @@ class BiodiversityDimensions(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. ecosystem_services: is the answer a useful and informative reply to the question, measured by the presence of terms matched against a vocabulary aligned with the Millennium Ecosystem Assessment?
+1. Ecosystem Services: is the answer a useful and informative reply to the question, measured by the presence of terms matched against a vocabulary aligned with the Millennium Ecosystem Assessment?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -201,7 +204,7 @@ class BiodiversityDimensions(Rubric):
 
 <Example-Response>
 {
-  "ecosystem_services": {"rating": "4", "rationale": "The synthesis includes nearly all relevant ecosystem services from the provided abstracts, with only minor omissions."}
+  "Ecosystem Services": {"rating": "4", "rationale": "The synthesis includes nearly all relevant ecosystem services from the provided abstracts, with only minor omissions."}
 }
 </Example-Response>
 </Response-Format>
@@ -210,6 +213,7 @@ class BiodiversityDimensions(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class EcosystemServices(Rubric):
+    name: str = "Ecosystem Services"
     system_prompt_template: str = ecosystem_services_prompt
 
 spatial_scale_prompt = """<Context> 
@@ -234,7 +238,7 @@ class EcosystemServices(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. spatial_scale: is the answer a useful and informative reply to the question, measured by the presence of explicit scale terms (e.g., “local,” “regional,” “continental”) and area measures?
+1. Spatial Scale: is the answer a useful and informative reply to the question, measured by the presence of explicit scale terms (e.g., “local,” “regional,” “continental”) and area measures?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -254,7 +258,7 @@ class EcosystemServices(Rubric):
 
 <Example-Response>
 {
-  "spatial_scale": {"rating": "4", "rationale": "The synthesis includes nearly all relevant spatial scale information from the provided abstracts, with only minor omissions."}
+  "Spatial Scale": {"rating": "4", "rationale": "The synthesis includes nearly all relevant spatial scale information from the provided abstracts, with only minor omissions."}
 }
 </Example-Response>
 </Response-Format>
@@ -263,6 +267,7 @@ class EcosystemServices(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class SpatialScale(Rubric):
+    name: str = "Spatial Scale"
     system_prompt_template: str = spatial_scale_prompt
 
 
diff --git a/yescieval/rubric/depth.py b/yescieval/rubric/depth.py
index 04aeb00..3e12dc3 100644
--- a/yescieval/rubric/depth.py
+++ b/yescieval/rubric/depth.py
@@ -22,7 +22,7 @@
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. mechanistic_understanding: does the answer reflect understanding of ecological processes by explicitly mentioning recognized mechanisms such as feedbacks, nutrient cycling, or trophic cascades?
+1. Mechanistic Understanding: does the answer reflect understanding of ecological processes by explicitly mentioning recognized mechanisms such as feedbacks, nutrient cycling, or trophic cascades?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -41,7 +41,7 @@
 
 <Example-Response>
 {
-  "mechanistic_understanding": {"rating": "4", "rationale": "The answer explains a clear multi-step ecological mechanism using causal language, but some temporal or boundary details are only briefly addressed."}
+  "Mechanistic Understanding": {"rating": "4", "rationale": "The answer explains a clear multi-step ecological mechanism using causal language, but some temporal or boundary details are only briefly addressed."}
 }
 </Example-Response>
 </Response-Format>
@@ -50,6 +50,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class MechanisticUnderstanding(Rubric):
+    name: str = "Mechanistic Understanding"
     system_prompt_template: str = mechanistic_understanding_prompt
 
 causal_reasoning_prompt = """<Context> 
@@ -74,7 +75,7 @@ class MechanisticUnderstanding(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. causal_reasoning: does the answer explicitly express cause–effect relationships using causal connectives (e.g., “because,” “due to”), result indicators (e.g., “results in,” “induces”), or mechanistic verbs (e.g., “drives,” “regulates”) when describing ecological processes?
+1. Causal Reasoning: does the answer explicitly express cause–effect relationships using causal connectives (e.g., “because,” “due to”), result indicators (e.g., “results in,” “induces”), or mechanistic verbs (e.g., “drives,” “regulates”) when describing ecological processes?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -94,7 +95,7 @@ class MechanisticUnderstanding(Rubric):
 
 <Example-Response>
 {
-  "causal_reasoning": {"rating": "4", "rationale": "The answer uses clear causal connectors and describes a multi-step cause–effect relationship."}
+  "Causal Reasoning": {"rating": "4", "rationale": "The answer uses clear causal connectors and describes a multi-step cause–effect relationship."}
 }
 </Example-Response>
 </Response-Format>
@@ -103,6 +104,7 @@ class MechanisticUnderstanding(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class CausalReasoning(Rubric):
+    name: str = "Causal Reasoning"
     system_prompt_template: str = causal_reasoning_prompt
 
 temporal_precision_prompt = """<Context> 
@@ -127,7 +129,7 @@ class CausalReasoning(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. temporal_precision: does the answer include specific and explicit temporal references, such as quantified time intervals or dated events, rather than vague or unspecific timing?
+1. Temporal Precision: does the answer include specific and explicit temporal references, such as quantified time intervals or dated events, rather than vague or unspecific timing?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -147,7 +149,7 @@ class CausalReasoning(Rubric):
 
 <Example-Response>
 {
-  "temporal_precision": {"rating": "4", "rationale": "The answer includes several specific timeframes or durations that are clearly linked to the described processes, though some timing details could be more precise."}
+  "Temporal Precision": {"rating": "4", "rationale": "The answer includes several specific timeframes or durations that are clearly linked to the described processes, though some timing details could be more precise."}
 }
 </Example-Response>
 </Response-Format>
@@ -156,5 +158,6 @@ class CausalReasoning(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class TemporalPrecision(Rubric):
+    name: str = "Temporal Precision"
     system_prompt_template: str = temporal_precision_prompt
 
diff --git a/yescieval/rubric/gap.py b/yescieval/rubric/gap.py
index facdcbd..8c6fa2a 100644
--- a/yescieval/rubric/gap.py
+++ b/yescieval/rubric/gap.py
@@ -22,7 +22,7 @@
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. gap_identification: To what extent does the answer explicitly identify research gaps or unanswered questions indicated by the provided abstracts?
+1. Gap Identification: To what extent does the answer explicitly identify research gaps or unanswered questions indicated by the provided abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -42,7 +42,7 @@
 
 <Example-Response>
 {
-  "gap_identification": {"rating": "4", "rationale": "Identifies a relevant gap supported by the abstracts, with limited elaboration."}
+  "Gap Identification": {"rating": "4", "rationale": "Identifies a relevant gap supported by the abstracts, with limited elaboration."}
 }
 </Example-Response>
 </Response-Format>
@@ -51,4 +51,5 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class GapIdentification(Rubric):
+    name: str = "Gap Identification"
     system_prompt_template: str = gap_identification_prompt
diff --git a/yescieval/rubric/informativeness.py b/yescieval/rubric/informativeness.py
index 9fd6788..bdfb448 100644
--- a/yescieval/rubric/informativeness.py
+++ b/yescieval/rubric/informativeness.py
@@ -51,6 +51,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Correctness(Rubric):
+    name: str = "Correctness"
     system_prompt_template: str = correctness_prompt
 
 completeness_prompt = """<Context> 
@@ -104,6 +105,7 @@ class Correctness(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Completeness(Rubric):
+    name: str = "Completeness"
     system_prompt_template: str = completeness_prompt
 
 informativeness_prompt = """<Context> 
@@ -157,5 +159,6 @@ class Completeness(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Informativeness(Rubric):
+    name: str = "Informativeness"
     system_prompt_template: str = informativeness_prompt
 
diff --git a/yescieval/rubric/innovation.py b/yescieval/rubric/innovation.py
index 290405a..628fa5d 100644
--- a/yescieval/rubric/innovation.py
+++ b/yescieval/rubric/innovation.py
@@ -22,7 +22,7 @@
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. speculative_statement: Does the answer clearly distinguish speculation (e.g., “might,” “could”) from established findings in the provided abstracts?
+1. Speculative Statements: Does the answer clearly distinguish speculation (e.g., “might,” “could”) from established findings in the provided abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -42,7 +42,7 @@
 
 <Example-Response>
 {
-  "speculative_statement": {"rating": "4", "rationale": "Uses hedging appropriately and clearly distinguishes speculation from established findings."}
+  "Speculative Statements": {"rating": "4", "rationale": "Uses hedging appropriately and clearly distinguishes speculation from established findings."}
 }
 </Example-Response>
 </Response-Format>
@@ -51,6 +51,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class SpeculativeStatements(Rubric):
+    name: str = "Speculative Statements"
     system_prompt_template: str = speculative_statements_prompt
 
 novelty_indicators_prompt = """<Context> 
@@ -75,7 +76,7 @@ class SpeculativeStatements(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. novelty_indicators: Does the answer appropriately use self-declared innovation terms (e.g., “novel,” “pioneering,” “emerging”) and clearly indicate whether such claims are supported by the provided abstracts?
+1. Novelty Indicators: Does the answer appropriately use self-declared innovation terms (e.g., “novel,” “pioneering,” “emerging”) and clearly indicate whether such claims are supported by the provided abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -95,7 +96,7 @@ class SpeculativeStatements(Rubric):
 
 <Example-Response>
 {
-  "novelty_indicators": {"rating": "4", "rationale": "Shows a clear novel angle, but lacks full detail."}
+  "Novelty Indicators": {"rating": "4", "rationale": "Shows a clear novel angle, but lacks full detail."}
 }
 </Example-Response>
 </Response-Format>
@@ -104,6 +105,7 @@ class SpeculativeStatements(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class NoveltyIndicators(Rubric):
+    name: str = "Novelty Indicators"
     system_prompt_template: str = novelty_indicators_prompt
 
 
diff --git a/yescieval/rubric/rigor.py b/yescieval/rubric/rigor.py
index 62c4aaf..cd428a4 100644
--- a/yescieval/rubric/rigor.py
+++ b/yescieval/rubric/rigor.py
@@ -22,7 +22,7 @@
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. statistical_sophistication: Does the answer reflect quantitative depth through the use of inferential statistics or analysis methods described in the abstracts?
+1. Statistical Sophistication: Does the answer reflect quantitative depth through the use of inferential statistics or analysis methods described in the abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -42,7 +42,7 @@
 
 <Example-Response>
 {
-  "statistical_sophistication": {"rating": "3", "rationale": "The synthesis provides some methodological details and basic statistics, but does not fully discuss limitations or reproducibility.""}
+  "Statistical Sophistication": {"rating": "3", "rationale": "The synthesis provides some methodological details and basic statistics, but does not fully discuss limitations or reproducibility.""}
 }
 </Example-Response>
 </Response-Format>
@@ -51,6 +51,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class StatisticalSophistication(Rubric):
+    name: str = "Statistical Sophistication"
     system_prompt_template: str = statistical_sophistication_prompt
 
 citation_practices_prompt = """<Context> 
@@ -75,7 +76,7 @@ class StatisticalSophistication(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. citation_practices: is the answer supported by appropriate references, using parenthetical or narrative citations, for the relevant information in the provided abstracts?
+1. Citation Practices: is the answer supported by appropriate references, using parenthetical or narrative citations, for the relevant information in the provided abstracts?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -95,7 +96,7 @@ class StatisticalSophistication(Rubric):
 
 <Example-Response>
 {
-  "citation_practices": {"rating": "3", "rationale": "Some claims are supported with citations, but several important points lack references or use inconsistent citation style."}
+  "Citation Practices": {"rating": "3", "rationale": "Some claims are supported with citations, but several important points lack references or use inconsistent citation style."}
 }
 </Example-Response>
 </Response-Format>
@@ -104,6 +105,7 @@ class StatisticalSophistication(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class CitationPractices(Rubric):
+    name: str = "Citation Practices"
     system_prompt_template: str = citation_practices_prompt
 
 uncertainty_acknowledgement_prompt = """<Context> 
@@ -128,7 +130,7 @@ class CitationPractices(Rubric):
 </Task-Description>
 
 <Evaluation-Characteristics>
-1. uncertainty_acknowledgement: does the answer explicitly discuss limitations, uncertainty, or gaps in evidence (e.g., using terms like “unknown,” “limited evidence,” or “unclear”)?
+1. Uncertainty Acknowledgement: does the answer explicitly discuss limitations, uncertainty, or gaps in evidence (e.g., using terms like “unknown,” “limited evidence,” or “unclear”)?
 </Evaluation-Characteristics>
 
 <Rating-Scale>
@@ -148,7 +150,7 @@ class CitationPractices(Rubric):
 
 <Example-Response>
 {
-  "uncertainty_acknowledgement": {"rating": "4", "rationale": "The answer clearly acknowledges key uncertainties and limitations in the study."}
+  "Uncertainty Acknowledgement": {"rating": "4", "rationale": "The answer clearly acknowledges key uncertainties and limitations in the study."}
 }
 </Example-Response>
 </Response-Format>
@@ -157,5 +159,6 @@ class CitationPractices(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class UncertaintyAcknowledgment(Rubric):
+    name: str = "Uncertainty Acknowledgement"
     system_prompt_template: str = uncertainty_acknowledgement_prompt
 
diff --git a/yescieval/rubric/structural.py b/yescieval/rubric/structural.py
index a968642..6b83550 100644
--- a/yescieval/rubric/structural.py
+++ b/yescieval/rubric/structural.py
@@ -51,6 +51,7 @@
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Coherence(Rubric):
+    name: str = "Coherence"
     system_prompt_template: str = coherence_prompt
 
 integration_prompt = """<Context> 
@@ -104,6 +105,7 @@ class Coherence(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Integration(Rubric):
+    name: str = "Integration"
     system_prompt_template: str = integration_prompt
 
 relevancy_prompt = """<Context> 
@@ -157,4 +159,5 @@ class Integration(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Relevancy(Rubric):
+    name: str = "Relevancy"
     system_prompt_template: str = relevancy_prompt
diff --git a/yescieval/rubric/stylistic.py b/yescieval/rubric/stylistic.py
index b369fdf..0e92757 100644
--- a/yescieval/rubric/stylistic.py
+++ b/yescieval/rubric/stylistic.py
@@ -52,6 +52,7 @@
 </Note>"""
 
 class Cohesion(Rubric):
+    name: str = "Cohesion"
     system_prompt_template: str = cohesion_prompt
 
 
@@ -106,6 +107,7 @@ class Cohesion(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Conciseness(Rubric):
+    name: str = "Conciseness"
     system_prompt_template: str = conciseness_prompt
 
 readability_prompt = """<Context> 
@@ -159,5 +161,6 @@ class Conciseness(Rubric):
 Your evaluation should be based solely on the content of the provided synthesis and abstracts. Ensure your rationale is objective and backed by specific examples from the provided material.
 </Note>"""
 class Readability(Rubric):
+    name: str = "Readability"
     system_prompt_template: str = readability_prompt