-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP needed: fit_transform does something other than fit(X).transform(X) #12
Comments
A |
How so? The point is whether we can adjust the contract / user expectations. Either we need to redefine the contract or we need to provide an interface. |
In terms of "we kind of do this in some places", the handling of training
points in kneighbors/radius_neighbors is a fair example of something
similar.
The places we have encountered such transforms include StackingTransformer (
scikit-learn/scikit-learn#8960) and
NearestNeighborsTransformer (
scikit-learn/scikit-learn#10482). Both could be
redesigned as meta-estimators, but would lose some conceptual simplicity.
We could also handle this case more explicitly by implementing fit_resample
where the estimator can use transform at test time.
|
I just realized that this is the place where I should have put scikit-learn/scikit-learn#15553 Basically what MLR does is that However, method chaining is so idiomatic in sklearn that I don't really see how to get around this. We could also "just" introduce a different verb/method say That would also solve #15. |
actually SLEP 001 (#2) basically implements something like this, only SLEP 001 also has a test-time version of it, which I think is a bit weird, or rather it conflates two issues: distinguishing training vs test time, and returning a complex object / tuple. |
There's a relevant discussion happening here: Maybe it's fair to say that there are three extensions to transform we're considering:
We're doing all three in SLEP 5, but with a focus on doing non-sample-aligned transformations, and only allowing somewhat trivial versions of 2 and 3. We need some more elaborate version of 2) for the target encoder (that I've brought up many times now lol): scikit-learn/scikit-learn#17323 SLEP 1 tries to do all of the together, I think which might not be necessary (we couldn't come up with good examples). We could achieve stacking and target transformers and more straight-forward handling in the neighbor graphs by:
cc @GaelVaroquaux who would like to move this forward :) |
Good point that @GaelVaroquaux brought up that was probably also in the other thread: |
Yes, I now think sample-aligned transformation is the key difference from
resampling.
|
This is required for stacking and leave-one-out target encoders if we want a nice design. We already kind of do this in some places but don't have a coherent contract. I think a slep would be nice.
The text was updated successfully, but these errors were encountered: