This project implements an Agentic AI Reasoning System designed for structured, multi-step reasoning tasks such as logic-based question answering.
It autonomously decomposes problems, selects appropriate tools, executes subtasks, and generates transparent reasoning traces.
Large Language Models (LLMs) often hallucinate intermediate steps or skip verification during logical reasoning.
To address this, this system implements an agentic reasoning framework that:
- Decomposes logic problems into smaller subtasks.
- Selects tools (symbolic solver, calculator, or code execution).
- Executes and verifies sub-results to ensure reliability.
- Generates step-by-step reasoning traces along with the final answer.
The system is designed to run on smaller LLMs or base models and avoid heavy proprietary reasoning models such as GPT-4, GPT-5, Claude 3, or Gemini Ultra.
Agentic-Reasoner/
β
βββ data/
β βββ train.csv
β βββ test.csv
β
βββ src/
β βββ __init__.py
β βββ main.py
β βββ data_loader.py
β βββ reasoning_agent.py
β βββ tool_selector.py
β βββ solver.py
β βββ verifier.py
β βββ utils.py
β
βββ outputs/
β βββ output.csv
β
βββ eval_runner.py
βββ README.md
βββ requirements.txt
Implements the agentic controller that:
- Decomposes the main problem into subtasks.
- Chooses appropriate tools for each subtask.
- Integrates all results into a coherent reasoning chain.
Chooses tools like:
- Symbolic Solver (for algebra, logic)
- Arithmetic Calculator
- Code Execution Module (for programmable subtasks)
Handles execution of mathematical or logical subtasks.
Checks subtask outputs for consistency and correctness.
Helper functions for logging, formatting reasoning traces, and CSV export.
# Clone the repository
git clone https://github.com/<your-username>/Agentic-Reasoner.git
cd Agentic-Reasoner
# Install dependencies
pip install -r requirements.txtYou can use train.csv to fine-tune a small model or validate the reasoning pipeline.
Run inference on the test dataset:
python src/main.pyThe system will:
- Read
test.csv - Decompose each problem
- Solve step-by-step
- Output reasoning traces and predictions to
outputs/output.csv
Output format:
| topic | problem_statement | solution | correct_option |
|---|
To generate predictions and evaluate them, run:
python eval_runner.pyThis script compares predicted answers with ground truth (if available) and computes metrics like Macro F1 Score.
Create a requirements.txt file with:
pandas
numpy
scikit-learn
sympy
Example row in outputs/output.csv:
| topic | problem_statement | solution | correct_option |
|---|---|---|---|
| Arithmetic | What is 2 + 2 Γ 3? | Step 1: Multiply 2Γ3=6. Step 2: Add 2+6=8. Final Answer: 8. | 2 |
- Macro F1 Score (50%)
- Approach Creativity & Originality (35%)
- Report Quality (10%)
- Code Quality (5%)
β
Transparent reasoning with trace logs
β
Modular, reusable pipeline
β
Verification for correctness
β
Interpretable output for human validation
Developed by J. Adarsh and contributors for the Agentic Reasoning Challenge.