I implemented the reinforcement learning based model Upper Confidence Bound in both Python and R
If we use to check if which ad is pleasing customers among many ads then we can use the reinforcement learning approach :
- Let we have X ads to display to a customer when he connects to Web
- Each time an user logs in we consider it an round
- At each roundn we choose one ad to display to the user
- At each round n , ad gives reward Ri(n)is the superset of {0,1} : Ri(n) = 1 , if the user clicked on the ad and 0 if the user didn't clicked .
- Our goal is to minimize the total rewards we get over many rounds
Steps :