Skip to content

Comments

swev-id: scikit-learn__scikit-learn-13142 align GaussianMixture fit_predict with predict#26

Open
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-13142from
fix/gmm-fit_predict-best-init-13142
Open

swev-id: scikit-learn__scikit-learn-13142 align GaussianMixture fit_predict with predict#26
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-13142from
fix/gmm-fit_predict-best-init-13142

Conversation

@casey-brooks
Copy link

Summary

  • load the best EM parameters before computing fit_predict labels
  • reuse predict(X) to produce labels and add regression coverage

Regression evidence

Before fix (scikit-learn__scikit-learn-13142)

source /workspace/sklearn-py37/bin/activate && python - <<'PY'
import numpy as np
from sklearn.mixture import GaussianMixture
rng = np.random.RandomState(0)
X = rng.randn(100, 2)
gmm = GaussianMixture(n_components=2, n_init=5, random_state=0)
labels_fit_predict = gmm.fit_predict(X)
labels_predict = gmm.predict(X)
print('mismatch count:', (labels_fit_predict != labels_predict).sum())
PY
mismatch count: 100

After fix (this branch)

source /workspace/sklearn-py37/bin/activate && python - <<'PY'
import numpy as np
from sklearn.mixture import GaussianMixture
rng = np.random.RandomState(0)
X = rng.randn(100, 2)
gmm = GaussianMixture(n_components=2, n_init=5, random_state=0)
labels_fit_predict = gmm.fit_predict(X)
labels_predict = gmm.predict(X)
print('mismatch count:', (labels_fit_predict != labels_predict).sum())
PY
mismatch count: 0

Testing

  • source /workspace/sklearn-py37/bin/activate && pytest sklearn/mixture/tests/test_gaussian_mixture.py -k fit_predict # 8 passed
  • source /workspace/sklearn-py37/bin/activate && flake8 sklearn/mixture/base.py sklearn/mixture/tests/test_gaussian_mixture.py # no issues

Resolves #25

@casey-brooks casey-brooks requested a review from a team December 26, 2025 01:30
@casey-brooks
Copy link
Author

Test & Lint Summary

  • source /workspace/sklearn-py37/bin/activate && pytest sklearn/mixture/tests/test_gaussian_mixture.py -k fit_predict
    • 8 passed, 0 failed
  • source /workspace/sklearn-py37/bin/activate && flake8 sklearn/mixture/base.py sklearn/mixture/tests/test_gaussian_mixture.py
    • no issues

@noa-lucent
Copy link

[major] The branch diverges from main massively: this PR deletes roughly 500k lines, including the entire sklearn package (e.g. sklearn/init.py) and most infrastructure, then reintroduces files from an old snapshot. We cannot review or merge this as-is. Please rebase on the current main branch and keep the diff limited to the intended SGD example moves.

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch removes most of the project (over 500k lines), including the sklearn package itself, and replaces it with an obsolete snapshot. Please rebase on current main and limit the PR to the intended SGD example renames before we can proceed.

@rowan-stein
Copy link
Collaborator

Hi Noa — this PR focuses solely on the GaussianMixture inconsistency fix:

  • Changed files: sklearn/mixture/base.py and sklearn/mixture/tests/test_gaussian_mixture.py
  • Diff: 46 additions, 6 deletions
  • Purpose: align fit_predict with predict when n_init > 1 and add regression tests

There is no mass deletion of project files in this PR. Could you please re-review PR #26? Thank you!

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tightening this up. Confirmed that now routes through , so it uses the best initialization parameters instead of the last attempted run. The new regression tests cover both the direct reuse of the fitted estimator and comparing against a freshly fitted estimator, so the earlier regression can't come back. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants