-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Splitting data into training and test datasets
The InstancesTrainTestSplit
function divides a given FixedDataGrid
into training and test portions. The first return value is approximately the proportion specified, the second is the remainder. This can be used for quick evaluations of a given algorithm.
Internally, the function generates a random number between 0 and 1 for each row. If the random number chosen is less than the proportion specified, then the row number is added to the training set, and otherwise added to the testing set. The training and testing FixedDataGrid
return values are provided by InstancesView
, which reorganises the underlying data in a memory efficient way.
Code excerpt: loading a dataset and splitting it into training and test sets
// Load in the iris dataset
iris, _ := base.ParseCSVToInstances("../datasets/iris_headers.csv", true)
// Create a 60-40 training-test split
trainData, testData := base.InstancesTrainTestSplit(iris, 0.60)
This code snippet asks for approximately 60% of the data to be returned as training data, leaving 40% for testing.