Skip to content

Harden model_analysis_jsons package-data config and add sdist regression test#659

Open
ReinerBRO wants to merge 2 commits intoyzhao062:masterfrom
ReinerBRO:test/package-data-sdist-642
Open

Harden model_analysis_jsons package-data config and add sdist regression test#659
ReinerBRO wants to merge 2 commits intoyzhao062:masterfrom
ReinerBRO:test/package-data-sdist-642

Conversation

@ReinerBRO
Copy link
Copy Markdown

Summary

  • harden package-data declarations so model_analysis_jsons are included via both pyod and pyod.utils package paths in setup.py
  • reinforce source distribution inclusion with graft pyod/utils/model_analysis_jsons in MANIFEST.in
  • add a packaging regression test that builds sdist and asserts JSON assets are present in the archive

Why

Issue #642 reports missing model_analysis_jsons files in installed distributions, which breaks auto model selection at runtime.

Validation

  • python3 -m pytest pyod/test/test_package_data.py -q
  • python3 -m py_compile setup.py

Closes #642

@yzhao062
Copy link
Copy Markdown
Owner

yzhao062 commented Apr 4, 2026

Hi @ReinerBRO, thanks for taking the time to look into this and for the regression test idea — that kind of guard is genuinely useful for packaging issues like this one.

A heads-up on timing: we already landed a fix for #642 in commit ad0d103 (Feb 26), which corrected the package_data key and added the recursive-include to MANIFEST.in. So the core bug is resolved on both development and master.

With that in mind, a few notes on the changes here:

setup.py — the second package_data entry under 'pyod'
The existing 'pyod.utils': ['model_analysis_jsons/*.json'] is correct and sufficient. Adding a parallel entry under 'pyod' means two declarations must be kept in sync going forward, and there is no documented case where a build backend would need one form over the other. I would prefer to keep just the single entry to avoid maintenance confusion.

MANIFEST.ingraft directive
The existing recursive-include pyod/utils/model_analysis_jsons *.json already covers the JSON files and is more precise since it filters by extension. graft would pull in any stray non-JSON files that might end up in that directory. Having both for the same path is also a bit confusing for future contributors.

test_package_data.py — the regression test
This is the part I would love to keep. A packaging regression test is a good safeguard. A few suggestions if you would like to revise:

  • Consider using python -m build --sdist instead of the deprecated setup.py sdist invocation.
  • Testing bdist_wheel as well would be even stronger, since most users install via wheels.
  • Adding if __name__ == '__main__': unittest.main() at the bottom would match the convention in all other test files.
  • Since building an archive is relatively slow, a skip marker or condition would be helpful so it does not run on every pytest invocation in CI.

Would you be open to submitting a revised PR with just the test (with the above tweaks)? That would be a welcome addition. Thanks again for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The package_data configuration in setup.py is incorrect

2 participants