forked from scikit-learn/scikit-learn
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
IndexError thrown with LogisticRegressionCV and refit=False
Description
The following error is thrown when trying to estimate a regularization parameter via cross-validation, without refitting.
Steps/Code to Reproduce
import sys
import sklearn
from sklearn.linear_model import LogisticRegressionCV
import numpy as np
np.random.seed(29)
X = np.random.normal(size=(1000, 3))
beta = np.random.normal(size=3)
intercept = np.random.normal(size=None)
y = np.sign(intercept + X @ beta)
LogisticRegressionCV(
cv=5,
solver='saga', # same error with 'liblinear'
tol=1e-2,
refit=False,
).fit(X, y)Expected Results
No error is thrown.
Actual Results
Traceback (most recent call last):
File "<stdin>", line 11, in <module>
File "/workspace/scikit-learn/sklearn/linear_model/logistic.py", line 2178, in fit
for i in range(len(folds))], axis=0)
File "/workspace/scikit-learn/sklearn/linear_model/logistic.py", line 2178, in <listcomp>
for i in range(len(folds))], axis=0)
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
Versions
... as in original issue ...
Researcher Specification
Root cause
- In
sklearn/linear_model/logistic.py,LogisticRegressionCV.fitindexescoefs_pathsassuming a fold dimension exists whenrefit=False. Under certain shape constructions, per-class entries can be 2D (missing the fold dimension), leading toIndexError: too many indices for array.
Proposed fix (high-level)
- Explicitly stack
coefs_pathswithnp.stack(..., axis=0)prior to reshape to avoid object arrays and guarantee dimensions. - Reshape to ensure per-class entries are 3D
(n_folds, n_cs * n_l1_ratios, n_features [+1]). - Use vectorized selection for per-fold best indices and add defensive shape assertions.
Tests
- Add refit=False tests for binary ovr with
sagaandliblinear, and multinomial withsaga(and elasticnet with l1_ratios) to validate shapes and absence of IndexError.
We will implement the fix and reference this Issue in a single PR targeting branch scikit-learn__scikit-learn-14087.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels