You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if the testing and training ({train} U {validate}) are drawn from the same source shot list, then the ratio conf['model']['train_frac'] is used to randomly divide the source shots without regards to the shot classes. This also occurs for the splitting of the train and validate sets with conf['model']['validation_frac'].
So, while the the division of the overall shot counts will exactly match the desired fractions within 1/N (where N is the total number of shots), the division of the non-/ disruptive shots among the sets may not be so close to that fraction. This is only a problem when the number of disruptive (or nondisruptive) samples is low and/or the training and testing sets are drawn from different raw lists. As the number of samples -> infinity, of course the N_{validate, disrupt}/N_{training, disrupt} -> conf['model']['validation_frac'], e.g.
There is no real reason not to explicitly divide the disruptive and non-disruptive classes when performing the splitting of the shot sets, so I think we should at least add it as an option, if not make it the default behavior
Consider renaming train_frac to test_frac (value = 1.0 - train_frac) or another name to make it clear that the "training fraction" is further divided between the training and hold-out validation sets.
The text was updated successfully, but these errors were encountered:
Currently, if the testing and training ({train} U {validate}) are drawn from the same source shot list, then the ratio
conf['model']['train_frac']
is used to randomly divide the source shots without regards to the shot classes. This also occurs for the splitting of the train and validate sets withconf['model']['validation_frac']
.So, while the the division of the overall shot counts will exactly match the desired fractions within 1/N (where N is the total number of shots), the division of the non-/ disruptive shots among the sets may not be so close to that fraction. This is only a problem when the number of disruptive (or nondisruptive) samples is low and/or the training and testing sets are drawn from different raw lists. As the number of samples -> infinity, of course the N_{validate, disrupt}/N_{training, disrupt} ->
conf['model']['validation_frac']
, e.g.There is no real reason not to explicitly divide the disruptive and non-disruptive classes when performing the splitting of the shot sets, so I think we should at least add it as an option, if not make it the default behavior
train_frac
totest_frac
(value= 1.0 - train_frac
) or another name to make it clear that the "training fraction" is further divided between the training and hold-out validation sets.The text was updated successfully, but these errors were encountered: