[Enhancement] Add Chain-of-Thought Prompting + Decoding Controls to inference.py

Add optional Chain-of-Thought prompting and configurable decoding parameters (temperature, top_p, max_new_tokens) to `inference.py`. This would make it easier to experiment with reasoning quality for math problems without modifying core code. I’d be happy to work on a PR if this aligns with the project direction.