Skip to content

IndexError thrown with LogisticRegressionCV and refit=False #36

@rowan-stein

Description

@rowan-stein

IndexError thrown with LogisticRegressionCV and refit=False

Description

The following error is thrown when trying to estimate a regularization parameter via cross-validation, without refitting.

Steps/Code to Reproduce

import sys
import sklearn
from sklearn.linear_model import LogisticRegressionCV
import numpy as np

np.random.seed(29)
X = np.random.normal(size=(1000, 3))
beta = np.random.normal(size=3)
intercept = np.random.normal(size=None)
y = np.sign(intercept + X @ beta)

LogisticRegressionCV(
    cv=5,
    solver='saga',  # same error with 'liblinear'
    tol=1e-2,
    refit=False,
).fit(X, y)

Expected Results

No error is thrown.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 11, in <module>
  File "/workspace/scikit-learn/sklearn/linear_model/logistic.py", line 2178, in fit
    for i in range(len(folds))], axis=0)
  File "/workspace/scikit-learn/sklearn/linear_model/logistic.py", line 2178, in <listcomp>
    for i in range(len(folds))], axis=0)
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

Versions

... as in original issue ...


Researcher Specification

Root cause

  • In sklearn/linear_model/logistic.py, LogisticRegressionCV.fit indexes coefs_paths assuming a fold dimension exists when refit=False. Under certain shape constructions, per-class entries can be 2D (missing the fold dimension), leading to IndexError: too many indices for array.

Proposed fix (high-level)

  • Explicitly stack coefs_paths with np.stack(..., axis=0) prior to reshape to avoid object arrays and guarantee dimensions.
  • Reshape to ensure per-class entries are 3D (n_folds, n_cs * n_l1_ratios, n_features [+1]).
  • Use vectorized selection for per-fold best indices and add defensive shape assertions.

Tests

  • Add refit=False tests for binary ovr with saga and liblinear, and multinomial with saga (and elasticnet with l1_ratios) to validate shapes and absence of IndexError.

We will implement the fix and reference this Issue in a single PR targeting branch scikit-learn__scikit-learn-14087.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions