Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate "test" and "validation" sets when dealing with large data #41

Open
earlbellinger opened this issue Sep 22, 2014 · 0 comments
Open
Assignees
Milestone

Comments

@earlbellinger
Copy link
Contributor

The user should be able to specify what fraction of the data should be left out for the final test set. For small data it can be 0, but for data like Kepler it should be around half. In the absense of user input, determine the fraction in some reasonable way by looking at the number of points available.

The process will go like this:

  1. Parse the data
  2. Split the data in half (or whatever fraction)
  3. Determine the model by separating one of the halves into k training and validation sets and doing cross validation
  4. Finally, calculate the R^2 and MSE on the other half
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant