Add optional Chain-of-Thought prompting and configurable decoding parameters (temperature, top_p, max_new_tokens) to inference.py. This would make it easier to experiment with reasoning quality for math problems without modifying core code. I’d be happy to work on a PR if this aligns with the project direction.