Skip to content

eugeniusms/TextualVerifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TextualVerifier

TextualVerifier Logo

Python 3.8+ License: MIT TextGrad Compatible

TextualVerifier is a self-verification framework that leverages Large Language Models (LLMs) to provide step-by-step verification capabilities for textual optimization processes. Built as a modular verification system, TextualVerifier addresses the critical gap of self-verification mechanisms in textual gradient optimization frameworks.

🎯 Use Cases

TextualVerifier is designed for robust verification across multiple optimization scenarios:

1. Solution Optimization

Verify mathematical reasoning, problem-solving steps, and computational solutions to ensure accuracy and correctness in multi-step problem solving.

2. Code Optimization

Validate code refinements, algorithm improvements, and implementation correctness during iterative code optimization processes.

3. Prompt Optimization

Verify prompt engineering improvements and ensure optimization steps maintain semantic coherence and effectiveness.

4. Loss Function Verification

Systematically verify loss value calculations and optimization results to prevent error propagation in gradient-based optimization.

5. Academic Reasoning

Enhance accuracy in academic domains including mathematics, machine learning, and physics through process-supervised verification.

πŸ“‹ Formal Definition

TextualVerifier implements a systematic four-stage verification methodology that transforms raw reasoning chains into verified, high-quality outputs through process supervision principles.

Mathematical Framework

The verification process follows the general formula:

instance + instruction β‡’ calculation
instance + calculation + verification_prompt β‡’ verified_calculation

Where:

  • instance: Input data or problem context
  • instruction: Task-specific optimization directive
  • calculation: Generated solution or reasoning chain
  • verified_calculation: Validated and potentially corrected output

Verification Modes

TextualVerifier supports four distinct verification configurations:

  1. Basic Verification (cot=False, breakdown=False): Single-unit verification for prompt optimization
  2. Step-by-Step Verification (cot=False, breakdown=True): Granular verification of pre-structured reasoning
  3. CoT Generation (cot=True, breakdown=False): Structured reasoning generation without granular verification
  4. Full Verification (cot=True, breakdown=True): Maximum verification capability for complex solutions

πŸ”„ Verification Stages

Stage 1: Step Extraction

Decomposes complex reasoning chains into individual logical steps using chain-of-thought prompting techniques. Extracts steps using regex patterns for <STEP>...</STEP> tags with fallback mechanisms for unstructured text.

Stage 2: Variant Generation

Creates multiple alternative formulations of each reasoning step using different verification perspectives:

  • Rule-Based Verification: Mathematical correctness and procedural adherence
  • Pedagogical Verification: Clarity and educational value assessment
  • Domain-Specific Verification: Specialized knowledge application

Stage 3: Majority Voting

Evaluates and selects the best variant through consensus-based decision making, ensuring robust verification through multiple perspectives.

Stage 4: Step Merging

Consolidates verified individual steps into a coherent, validated reasoning chain with explicit <VERIFIED>...</VERIFIED> tags for downstream processing.

πŸš€ Quick Start

Installation

pip install textualverifier

Basic Usage

from textualverifier import TextualVerifier
from textgrad.engine import get_engine

# Initialize the verifier
engine = get_engine("gpt-4o")
verifier = TextualVerifier(
    verifier_engine=engine,
    use_cot_generation=True,
    use_step_breakdown=True
)

# Verify a calculation
verified_result = verifier.verify(
    instance=Variable("Solve xΒ² - 7x + 2 = 0"),
    instruction=Variable("Solve the quadratic equation step by step"),
    calculation=Variable("Using quadratic formula: x = (7 ± √(49-8))/2")
)

print(verified_result.value)

Advanced Configuration

# Full verification with custom prompts
full_verifier = TextualVerifier(
    verifier_engine=engine,
    use_cot_generation=True,
    use_step_breakdown=True,
    verification_task_prompts=[
        "Evaluate mathematical correctness and procedural rules",
        "Review from teaching assistant perspective",
        "Check for domain-specific accuracy"
    ]
)

πŸ”— TextGrad Integration

TextualVerifier is designed to integrate seamlessly with TextGrad optimization workflows:

import textgrad as tg
from textualverifier import TextualVerifier

# Standard TextGrad optimization
optimizer = tg.TGD(parameters=[solution])
loss = tg.Variable("Evaluation of solution accuracy")

# With TextualVerifier integration
verifier = TextualVerifier(engine, use_cot_generation=True, use_step_breakdown=True)

# Verify during optimization process
for step in range(optimization_steps):
    optimizer.zero_grad()
    loss.backward()
    
    # Verify before applying optimization step
    verified_solution = verifier.verify(
        instance=problem,
        instruction=optimization_instruction, 
        calculation=solution
    )
    
    optimizer.step()

Integration Points

  • Loss Value Verification: Verify loss calculations before backpropagation
  • Optimizer Result Verification: Validate optimization results before acceptance
  • Combined Verification: Comprehensive verification of both loss values and optimization outcomes

πŸ—οΈ Architecture

TextualVerifier features a modular architecture with four core components:

Input β†’ TextualVerifier()
         β”œβ”€β”€ Step Extractor (CoT decomposition)
         β”œβ”€β”€ Variant Generator (Multi-perspective analysis)  
         β”œβ”€β”€ Voting Mechanism (Consensus selection)
         └── Step Merger (Verification consolidation)
       β†’ Verified Output

πŸ“Š Performance

Experimental results on academic benchmarks:

Dataset Base Accuracy With TextualVerifier Improvement
GPQA-Diamond - - +5.56pp
MMLU-ML - - +7.14pp
MMLU-CP - - +2.94pp

Results show percentage point improvements over baseline TextGrad optimization

πŸ› οΈ Configuration Options

Verification Strategies

# Computational efficiency vs accuracy trade-off
verifier_configs = {
    "fast": TextualVerifier(engine, use_cot_generation=False, use_step_breakdown=False),
    "balanced": TextualVerifier(engine, use_cot_generation=True, use_step_breakdown=False), 
    "comprehensive": TextualVerifier(engine, use_cot_generation=True, use_step_breakdown=True)
}

Multi-Perspective Verification

Configure multiple verification perspectives for robust consensus:

verification_prompts = [
    "Evaluate mathematical correctness and procedural rules",
    "Review from teaching assistant perspective", 
    "Check for logical consistency and completeness",
    "Assess clarity and explanatory quality"
]

πŸ“š Documentation

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸŽ“ Citation

If you use TextualVerifier in your research, please cite:

@misc{textualverifier2024,
  author       = {Eugenius Mario Situmorang},
  title        = {TextualVerifier: Self-Verification Framework for Textual Optimization},
  year         = {2024},
  howpublished = {IR-NLP Lab, Department of Computer Science, Universitas Indonesia},
  note         = {Manuscript in preparation}
}

πŸ”— Related Work


Built with ❀️ by the IR-NLP Lab, Universitas Indonesia

About

LLM-Based Textual Verifier using Chain-of-Thought, Variant Generation, and Majority Voting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages