Separate "test" and "validation" sets when dealing with large data #41

earlbellinger · 2014-09-22T17:04:27Z

The user should be able to specify what fraction of the data should be left out for the final test set. For small data it can be 0, but for data like Kepler it should be around half. In the absense of user input, determine the fraction in some reasonable way by looking at the number of points available.

The process will go like this:

Parse the data
Split the data in half (or whatever fraction)
Determine the model by separating one of the halves into k training and validation sets and doing cross validation
Finally, calculate the R^2 and MSE on the other half

earlbellinger modified the milestone: Science goals Sep 22, 2014

earlbellinger added the enhancement label Sep 22, 2014

earlbellinger self-assigned this Sep 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate "test" and "validation" sets when dealing with large data #41

Separate "test" and "validation" sets when dealing with large data #41

earlbellinger commented Sep 22, 2014

Separate "test" and "validation" sets when dealing with large data #41

Separate "test" and "validation" sets when dealing with large data #41

Comments

earlbellinger commented Sep 22, 2014