sciknoworg · HamedBabaei · Feb 9, 2026 · Feb 2, 2026 · Feb 3, 2026 · Feb 4, 2026
diff --git a/README.md b/README.md
@@ -91,14 +91,15 @@ Judges within YESciEval are defined as follows:
 | `GPTCustomAutoJudge`| Custom GPT-based LLM that can be used as a judge within YESciEval                     |
 
 
-A total of **23** evaluation rubrics were defined as part of the YESciEval test framework and can be used via ``yescieval``. Following simple example shows how to import rubrics in your code:
+A total of **22** evaluation rubrics were defined as part of the YESciEval test framework and can be used via ``yescieval``. Following simple example shows how to import rubrics in your code:
 
 ```python
 from yescieval import Informativeness, Correctness, Completeness, Coherence, Relevancy,\
                       Integration, Cohesion, Readability, Conciseness,\
                       MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification,\
+                      ContextCoverage, MethodCoverage, DimensionCoverage, ScaleCoverage, ScopeCoverage,\
                       StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment,\
-                      SpeculativeStatements, NoveltyIndicators
+                      StateOfTheArtAndNovelty
 ```
 
 A complete list of rubrics are available at YESciEval [📚 Rubrics](https://yescieval.readthedocs.io/rubrics.html) page.

diff --git a/docs/source/rubrics.rst b/docs/source/rubrics.rst
@@ -2,22 +2,23 @@
 Rubrics
 ===================
 
-A total of **21** evaluation rubrics were defined as part of the YESciEval test framework within two categories presented as following:
+A total of **22** evaluation rubrics were defined as part of the YESciEval test framework within two categories presented as following:
 
 .. hint::
 
 
-	Here is a simple example of how to import rubrics in your code:
+    Here is a simple example of how to import rubrics in your code:
 
-	.. code-block:: python
+    .. code-block:: python
 
-	    from yescieval import Informativeness, Correctness, Completeness, Coherence, Relevancy,
-	                          Integration, Cohesion, Readability, Conciseness,
-	                          MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification,
-	                          StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment,
-	                          SpeculativeStatements, NoveltyIndicators
+        from yescieval import Informativeness, Correctness, Completeness, Coherence, Relevancy,
+                              Integration, Cohesion, Readability, Conciseness,
+                              MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification,
+                              ContextCoverage, MethodCoverage, DimensionCoverage, ScopeCoverage, ScaleCoverage,
+                              StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment,
+                              StateOfTheArtAndNovelty
 
-	The rubrics are presented as following:
+    The rubrics are presented as following:
 
 
 Question Answering
@@ -150,7 +151,7 @@ Following ``Research Breadth Assessment`` evaluates the diversity of evidence ac
      - Does the answer distribute attention across multiple distinct scales relevant to the research question?
 
 Scientific Rigor Assessment
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Following ``Scientific Rigor Assessment`` assesses the evidentiary and methodological integrity of the synthesis.
 
@@ -161,13 +162,15 @@ Following ``Scientific Rigor Assessment`` assesses the evidentiary and methodolo
 
    * - Evaluation Rubric
      - Description
-   * - **18. Quantitative Evidence And Uncertainty:**
-     - Does the answer appropriately handle quantitative evidence and uncertainty relevant to the research question?
-   * - **19. Epistemic Calibration:**
-     - Does the answer clearly align claim strength with evidential support by marking uncertainty, assumptions, and limitations where relevant?
+   * - **18. Statistical Sophistication:**
+     - Does the answer use statistical methods or analyses, showing quantitative rigor and depth?
+   * - **19. Citation Practices:**
+     - Does the answer properly cite sources, using parenthetical or narrative citations (e.g., “(Smith et al., 2021)”)?
+   * - **20. Uncertainty Acknowledgment:**
+     - Does the answer explicitly mention limitations or uncertainty, using terms like “unknown,” “limited evidence,” or “unclear”?
 
 Innovation Capacity Assessment
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Following ``Innovation Capacity Assessment`` evaluates the novelty of the synthesis.
 
@@ -178,9 +181,8 @@ Following ``Innovation Capacity Assessment`` evaluates the novelty of the synthe
 
    * - Evaluation Rubric
      - Description
-   * - **20. State-Of-The-Art And Novelty :**
-     - Does the response identify and contextualize relevant state-of-the-art or novel contributions relative to prior work?
-
+   * - **21. State Of The Art And Novelty:**
+     - Does the response identify specific state-of-the-art and/or novel contributions relevant to the research question, using terms like “novel,” “state-of-the-art”?
 
 Research Gap Assessment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -194,7 +196,7 @@ Following ``Research Gap Assessment`` detects explicit acknowledgment of unanswe
 
    * - Evaluation Rubric
      - Description
-   * - **21. Gap Identification:**
+   * - **22. Gap Identification:**
      - Does the answer point out unanswered questions or understudied areas, using terms like “research gap” or “understudied”?
 
 

diff --git a/yescieval/__init__.py b/yescieval/__init__.py
@@ -5,9 +5,10 @@
 from .base import Rubric, Parser
 from .rubric import (Informativeness, Correctness, Completeness, Coherence, Relevancy,
                     Integration, Cohesion, Readability, Conciseness,
+                    ScaleCoverage, ContextCoverage, ScopeCoverage, MethodCoverage, DimensionCoverage,
                     MechanisticUnderstanding, CausalReasoning, TemporalPrecision, GapIdentification, 
                     StatisticalSophistication, CitationPractices, UncertaintyAcknowledgment, 
-                    SpeculativeStatements, NoveltyIndicators)
+                    StateOfTheArtAndNovelty)
 from .injector import ExampleInjector, VocabularyInjector
 from .judge import AutoJudge, AskAutoJudge, BioASQAutoJudge, CustomAutoJudge, GPTCustomAutoJudge
 from .parser import GPTParser
diff --git a/yescieval/base/domain.py b/yescieval/base/domain.py
@@ -1,9 +1,10 @@
 from abc import ABC
 from pydantic import BaseModel
-from typing import Dict
+from typing import Dict, Optional
 
 class Domain(BaseModel, ABC):
     examples: Dict[str, Dict] = None
     vocab: Dict[str, Dict] = None
     ID: str = None
-    verbalized: str = None
+    verbalized: str = None
+    vocab_block_specs: Optional[Dict[str, Dict[str, object]]] = None
diff --git a/yescieval/injector/domains/__init__.py b/yescieval/injector/domains/__init__.py
@@ -8,4 +8,6 @@
 
 example_responses: Dict[str, Dict] = {domain.ID: domain.examples for domain in domains}
 
-verbalized_domains: Dict[str, str] = {domain.ID: domain.verbalized for domain in domains}
+verbalized_domains: Dict[str, str] = {domain.ID: domain.verbalized for domain in domains}
+
+vocab_block_specs: Dict[str, Dict[str, object]] = {domain.ID: domain.vocab_block_specs for domain in domains}
diff --git a/yescieval/injector/domains/ecology.py b/yescieval/injector/domains/ecology.py
@@ -27,7 +27,7 @@
         "genetic diversity", "structural diversity", "shannon", "simpson", "hill numbers"
     ],
     "temporal_terms" :[
-        "within 2–5 years", "lag of ~6 months", "after 3 months", "before 12 weeks", "1998–2004",
+        "within 2-5 years", "lag of ~6 months", "after 3 months", "before 12 weeks", "1998-2004",
         "June 2012", "every 2 weeks"
     ],
     "ecosystem_services": [
@@ -62,7 +62,20 @@
         "climate change", "global warming", "drought", "heatwave", "extreme weather", "phenology", "range shift",
         "sea level rise", "ocean acidification", "greenhouse gas", "carbon dioxide", "thermal stress", "precipitation"
     ],
-    "complexity_terms": ["nonlinear", "emergent", "synergistic", "interconnected", "complex", "multifaceted"]
+    "complexity_terms": ["nonlinear", "emergent", "synergistic", "interconnected", "complex", "multifaceted"],
+    "gap_identification": [
+        "remains unclear", "unknown", "not well understood", "limited evidence", "mixed findings", "inconsistent results", "lack of consensus",
+        "understudied", "data scarce", "few studies", "limited sample size", "short time horizon", "lack of longitudinal data",
+        "geographic bias", "taxonomic bias", "context dependence", "limited external validity", "missing comparison", "unresolved"
+    ],
+    "novelty_indicators": [
+        "first to", "novel", "new approach", "new method", "recent advances", "state of the art", "cutting-edge",
+        "proof-of-concept", "pilot study", "new dataset", "long-term dataset", "high-resolution data",
+        "remote sensing", "satellite", "LiDAR", "eDNA", "metabarcoding", "new sampling protocol", "new monitoring approach",
+        "hierarchical model", "Bayesian", "causal inference", "counterfactual", "difference-in-differences", "instrumental variable",
+        "meta-analysis", "systematic review", "scenario analysis", "climate projection", "compared to previous studies",
+        "unlike prior work", "addresses a limitation"
+    ]
 }
 
 example_responses = {
@@ -97,11 +110,101 @@
                 "rationale": "The response uses specific and bounded temporal expressions, for example describing changes occurring within 2-5 years, after 3 months, or every 2 weeks, and referencing defined time periods such as 1998-2004 or June 2012."
             }
         ]
+    },
+    "Breadth": {
+        "ContextCoverage": [
+            {
+                "rating": "1",
+                "rationale": "The response discusses only a single ecological setting and does not reference any alternative regions or biomes relevant to the research question."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response covers multiple distinct ecological contexts, such as different regions and ecosystem types, and distributes attention across them rather than focusing on a single setting."
+            }
+        ],
+        "MethodCoverage": [
+            {
+                "rating": "1",
+                "rationale": "The response focuses exclusively on a single management or intervention approach (e.g., controlled burning) and does not reference any alternative methods relevant to the research question."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response discusses multiple distinct interventions or management approaches, such as controlled burning, grazing management, habitat restoration, and protected areas, distributing attention across them rather than focusing on a single method."
+            }
+        ],
+        "DimensionCoverage": [
+            {
+                "rating": "1",
+                "rationale": "The response focuses on a single ecological dimension and does not meaningfully address other relevant dimensions."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response covers multiple ecological dimensions, including taxonomic, functional, and phylogenetic diversity, as well as measures such as species richness, evenness, abundance, and genetic and structural diversity, rather than focusing on just one single dimension."
+            }
+        ],
+        "ScopeCoverage": [
+            {
+                "rating": "1",
+                "rationale": "The response addresses only a very narrow aspect of ecological impact and remains vague, providing little indication that the findings apply beyond a single, limited scope."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response discusses several types of ecosystem services, from provisioning and regulating services to supporting and cultural services, rather than focusing on just one single, limited scope"
+            }
+        ],
+        "ScaleCoverage": [
+            {
+                "rating": "1",
+                "rationale": "The response focuses on a single ecological scale and does not meaningfully consider how the findings apply at other relevant scales."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response addresses multiple ecological scales, ranging from the individual and population level to community and ecosystem scales, and also considers broader spatial scales such as local, regional, and global contexts, rather than focusing on just one scale."
+            }
+        ]
+    },
+    "Gap": {
+        "GapIdentification": [
+            {
+                "rating": "1",
+                "rationale": "The response is purely descriptive, summarizing existing ecological findings, observations, or reported patterns (e.g., species distributions, biodiversity metrics, or observed correlations) without identifying any missing, unknown, inconsistent, or unresolved aspects relevant to the research question."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response clearly identifies specific gaps or limitations in the ecological evidence base that are relevant to the research question (e.g., missing data for certain regions, taxa, or time periods; limited experimental studies; or conflicting empirical findings) and provides some explanation of why these gaps matter; minor ambiguity or imprecision may remain."
+            }
+        ]
+    },
+    "Innovation": {
+        "StateOfTheArtAndNovelty": [
+            {
+                "rating": "1",
+                "rationale": "The response gives a generic overview of known ecological findings or methods without identifying any specific state-of-the-art approaches or novel contributions; or it uses buzzwords like state of the art or cutting-edge without explaining what is new."
+            },
+            {
+                "rating": "4",
+                "rationale": "The response identifies concrete state-of-the-art or novel ecological contributions (e.g., new datasets, long-term or high-resolution data, remote sensing such as satellite or LiDAR, eDNA/metabarcoding, or new modeling or monitoring approaches) and briefly explains what improvement or new capability they provide, with minor gaps in comparison or detail."
+            }
+        ]
     }
 }
 
+vocab_block_specs = {
+    "mechanistic_vocab_block": {"label": "Mechanistic terms", "keys": ["mechanistic_terms"]},
+    "causal_vocab_block": {"label": "Causal connectives / triggers", "keys": ["causal_terms"]},
+    "temporal_vocab_block": {"label": "Temporal expressions", "keys": ["temporal_terms"]},
+    "context_coverage_vocab_block": {"label": "Context Coverage", "keys": ["regions"]},
+    "method_coverage_vocab_block": {"label": "Method Coverage", "keys": ["interventions"]},
+    "dimension_coverage_vocab_block": {"label": "Dimension Coverage", "keys": ["diversity_dimensions"]},
+    "scope_coverage_vocab_block": {"label": "Scope Coverage", "keys": ["ecosystem_services"]},
+    "scale_coverage_vocab_block": {"label": "Scale Coverage", "keys": ["scale_terms"]},
+    "gap_identification_vocab_block": {"label": "Gap Identification", "keys": ["gap_identification"]},
+    "novelty_indicators_vocab_block": {"label": "State of the Art and Novelty Indicators", "keys": ["novelty_indicators"]}
+}
+
 class Ecology(Domain):
     examples: Dict[str, Dict] = example_responses
     vocab: Dict[str, Dict] = vocabulary
     ID: str = "ecology"
-    verbalized: str = "Ecology"
+    verbalized: str = "Ecology"
+    vocab_block_specs: Dict[str, Dict[str, object]] = vocab_block_specs