-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Leakage #249
Comments
I don't understand what the issue here is? |
I think the worry is the normalize is applied to the whole data set which
could potentially overfit the model because the validation data is also
normalized.
Best,
Chris
…On Sun, Jun 4, 2023 at 8:48 AM Kyle Skompinski ***@***.***> wrote:
I don't understand what the issue here is?
—
Reply to this email directly, view it on GitHub
<#249 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAVP5WZGNGLQYZS6PVW3IATXJSU35ANCNFSM6AAAAAAYRZ5LRA>
.
You are receiving this because you are subscribed to this thread.Message
ID: <kyleskom/NBA-Machine-Learning-Sports-Betting/issues/249/1575616982@
github.com>
|
Ill take a look when I revisit this next season |
Hi looking for more info on what the potential fix for this would be. Thank you. |
You will need to separate train and test data when you are using tf.keras.utils.normalize. But normally you should use Scaler in scikit-learn to separate train and test data, fit the train data then transform both train and test data. |
I am not sure I agree with @nova-land. The idea of normalize is to set the data for the entire dataset equally. Imagine you have a data set that has values of [3,1,0.50] and you normalize this. It would change to [1, .33, .165]. If your next dataset has a higher value, it would adjust based on the highest data on the column. There are keras layers you can do which will normalize the data inside the model itself, which would not require this function to be called. Or you can normalize the data when it comes in, setting max values. For example, if a player scores 56 points, and your goal is predict how many points a player is going to score from 0 to 50 (Your force normalizing here), then the max he can score is 50. Just an example. I am not an expert here, but you have to make sure you have this code in your training set. Then when your ready to predict, you load these values and send the predictions through the normalize function as well.
|
Here is information on the normalize layer. You would add this before your first dense layer, this will normalize the incoming data and store its weights inside the model file itself. Then you would not have to make any changes to the data or even call MinMax normalize within the file itself. https://keras.io/api/layers/normalization_layers/batch_normalization/ Also worth noting, this is for the NN model, not XGBoost. |
but then which would be better xg or nn model? |
Better is not a good word at all to use in models. There are a million factors. That question cannot be answered. |
Okay, put another way. What probability would be closest since I made $2,000 in two weeks via XGboost with just a $10 stake. in the end season in May, and I didn’t pay attention to the NN model... |
so I always relied on over and under |
How's this working out for you now? |
this year wasn't so good |
So you're not seeing 55% win rates with this strategy? |
The use of
tf.keras.utils.normalize
will provide invalid test result by normalising the whole dataset.An evaluation script is required to verify the accuracy of the model
The text was updated successfully, but these errors were encountered: