A reinforcement learning project that demonstrates dynamic pricing strategy optimization using Q-learning with scikit-learn. The agent learns to set optimal prices in a simulated retail environment, considering factors like inventory levels, time remaining, and competitor pricing.
- Reinforcement Learning Agent: Implements Q-learning using scikit-learn's SGDRegressor for online learning
- Pricing Environment Simulation: Models a retail scenario with inventory management, competitor reactions, and demand elasticity
- Discrete Action Space: Agent selects from predefined price levels (40, 45, 50, 55, 60)
- Performance Tracking: Logs training progress and calculates revenue improvement metrics
- Visualization: Generates learning curve plots showing revenue optimization over training episodes
- Clone or download this repository
- Install the required dependencies:
pip install -r requirement.txtRun the main script to train the agent and generate results:
python main.pyThe script will:
- Train the reinforcement learning agent for 200 episodes
- Display performance metrics (initial vs. optimized revenue)
- Generate and save a visualization plot as
pricing_strategy_results.png - Show the learning curve with raw episode data and moving average
After training, the agent demonstrates significant revenue improvement. The generated plot (pricing_strategy_results.png) shows:
- Raw episode revenue data (gray line)
- 20-episode moving average (blue line) to illustrate learning progress
- Clear upward trend as the agent optimizes pricing strategy
Example output:
--- Results for Resume ---
Average Revenue (Early Training): $XXXX.XX
Average Revenue (After Optimization): $XXXX.XX
Performance Lift: XX.X%
- numpy>=1.24.0
- pandas>=2.0.0
- scikit-learn>=1.3.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- joblib>=1.3.0
- scipy>=1.10.0
- tqdm>=4.65.0
- Environment: Simulates 30-day retail period with inventory constraints and competitor price adjustments
- Agent: Uses Q-learning to estimate value of each price action in different states
- Learning: Updates Q-values using temporal difference learning with epsilon-greedy exploration
- Optimization: Balances immediate revenue against inventory management and competitor dynamics
The agent learns to adapt pricing based on remaining inventory, time pressure, and competitor behavior, resulting in optimized revenue generation.
📜 License This project is open-source and available under the MIT License.