-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curve Predictor #3
Comments
Hi @baothienpp, I've implemented See the code in curve_predictor.py. Feel free to implement |
That sounds interesting though, cause I tried the implementation from the paper. It took a lot of computational power because of Monte Carlo calculation. I am trying to understand your method, could you tell me more the concept behind it, or is it very similar to the paper? |
Yeah, I can imagine.
It's a simple linear regression, implemented by applying a normal equation. The whole math is in |
May i ask you why you don't fit a polynomial instead of linear ? Do you think we could use Gaussian Process with square exponential to model the learn curve ? |
Hi I think I figured out why you don't use polynomial because you fit a linear on a set of learning curves ( multivariables regression). At first, I understood that you fit a linear on every single curve and make prediction base on that. So that means the burn-in period is the set of learning curves you have to collect first , did i understand you correctly ? |
Hi @baothienpp , Correct, the features are the whole curve. So the predictor doesn't try to learn trends or something like that, it compares the given curve to the set of previous ones and checks the probability it'll be better. The burn-in period is basically the training data for the predictor. I'm sure there are more sophisticated models, and I'd love to have more implementations in the library. If you're interested to contribute, I'd be happy to merge it. |
By the way, I've added a bunch examples lately. Please take a look, looking forward to your feedback. |
Thanks for those examples, really help. I am thinking about using Bayesian linear regression (blr) instead of simple linear regression. blr output will be a normal distribution, we could use simple math to calculate the probability that a learning curve will be good or bad. I will try it first, and report later. Generally, I like the idea of using simple regression over the model in the paper, it is just too much computational overhead |
@baothienpp Sounds great. Looking forward to seeing your model in action. When you will test it, take a look at the tests. |
Hi Maxim, short unrelated question : If i want to use your idea in some of my work, how can i cite you ? |
Hi @baothienpp That'll be great if you do this. Please use this code:
Of course, I'll be curious to read the paper once it's out, so don't forget to post the link here ;) |
Thanks ! Unfortunately it is something for work so i can't public :( , but don't worry i cited you. It seems like your framework can only handle single GPU, any chances for multi GPU? |
So i did build a new model using your idea. I used Bayesian ridge regression. Basically, in Linear Regression you minimize the MSE error and in Ridge Regression you minimize the (MSE+ L2 regularization), for more detail you can read here. Bayesian ridge regression is then the probabilistic version of ridge regression which output is mean and variance. I then calculate the probability that current curve yield a better high than the previous best, the formula is exactly the one in Probability Improvement. I tested it with your cifar10 learning curve set. Here the result (the dashed lines are the curves that used in burn-in) |
This looks really impressive: the burn-in period 5 is very low! Thanks for the update. |
Sorry, I forgot about your question: right now, the model itself can go multi-gpu and that's it. I'd implement distributed training on the library level, but I think the trivial Bayesian optimization will assign the same hyper-parameters to all GPUs, so it doesn't make sense. It should be a bit smarter and run different optimizations in parallel, e.g., UCB on GPU 0 and PI method on GPU 1. |
I am currently a bit busy, but i will soon upload a short code to describe how i did it because i implemented it different from your interface. Another question, is the portfolio strategy you used, kind of randomly choosing a utility function every iteration ? |
OK. No problem.
Yes, see |
So i am gonna briefly describe my method. I used scikit-learn to implement BRR ( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html#sklearn.linear_model.BayesianRidge). It has 2 method fit() and predict() , it is important to set the parameter return_std in predict() to true. So now you have the prediction and the std. To calculate the probability , i used the scipy package to calculate the cdf :
|
Got it. Do you use the same data as I did, i.e. the set of learning curves? |
Yes i used the curves in your json file |
Hi , i am from Stackoverflow. I am trying to understand your implementation from the paper " Extrapolating of Learning Curve .. ". As far as i understand , they use 11 different mathematic model to fit the learning curve and then predict with monte carlo estimator. But i can't find in your code where you built these model and where the monte carlo calculation are. Can you please clarify it ? Thanks
The text was updated successfully, but these errors were encountered: