-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: Better package splitting for multi-output recipes with negative glob in outputs/files #5216
Conversation
as keys in `files`
We require contributors to sign our Contributor License Agreement and we don't have one on file for @carterbox. In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#891), and ping the bot to refresh the PR. |
CodSpeed Performance ReportMerging #5216 will not alter performanceComparing Summary
|
I'm not sure that the failure for linux 3.8 23.5.0 serial is related to this PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. The PR is super clear and the documentation is on-point. I only added a couple comments to clarify my own doubts. The new positional argument could be considered API breaking, so let's see how the rest of the team feels about it. I don't have strong feelings, just applying an abundance of caution Just In Case ✨ .
Once we have discussed that, I'll be super happy to approve. 👍
conda_build/build.py
Outdated
metadata: MetaData, | ||
env, | ||
stats, | ||
new_prefix_files: set[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is adding a new positional argument to a (let's assume) public signature, what about making it a None
kwarg. This way downstream users won't run into number of arguments errors:
new_prefix_files: set[str], | |
new_prefix_files: set[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point; it is API breaking assuming these functions are part of the public API. I need to mark the new argument as typing.Optional
and/or set appropriate defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conda_build/build.py
Outdated
metadata: MetaData, | ||
env, | ||
stats, | ||
new_prefix_files: set[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the API breakage. Not sure how prominent the usage of wheel outputs is, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1299,9 +1299,10 @@ You can specify files to be included in the package in 1 of | |||
Explicit file lists are relative paths from the root of the | |||
build prefix. Explicit file lists support glob expressions. | |||
Directory names are also supported, and they recursively include | |||
contents. | |||
contents. Files installed to the prefix by host dependencies will | |||
be matched by glob expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe even highlight this further with an admonition or something similar. I wasn't aware of that design issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with an admonition or something similar.
I don't know what you mean. You want stronger language? For example: "WARNING! Files installed to the prefix by host dependencies also will be matched by glob expressions."
I wasn't aware of that design issue!
Yeah, I'm not sure whether this is a bug/oversight or a feature. It's definitely unexpected to me. The most active contributor in this section is @msarahan; maybe they know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant an admonition in the RST sense. Triple-backtick + warning
kind of syntax. Equivalent to this:
Warning
Files installed to the prefix by host dependencies also will be matched by glob expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 642906b
files: | ||
- subpackage_file1 | ||
- somedir | ||
- "*.ext" | ||
include: | ||
- subpackage_file1 | ||
- somedir | ||
- "*.ext" | ||
exclude: | ||
- "*3.ext" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace the old test with a new one instead of adding a new case? Maybe I would add a new output in this meta.yaml and then test the different behaviors differently, if possible (e.g. how the list-of-str behaviour does copy anything in PREFIX, but include/exclude doesn't?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the test recipe to include multiple outputs: one uses the old file matching and one uses the new file matching.
Files can be excluded by specifying `files` as a dictionary separating | ||
files to `include` from those to `exclude`. Files installed to the prefix | ||
by host dependencies are automatically excluded when the include/exclude | ||
syntax is used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e., exclude
entries have higher priority as per the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used".
But since this is a deviation from the previous "include" logic (which would be more of a force-include
since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e.,
exclude
entries have higher priority as per the implementation.
exclude
has a higher priority than which operation? It's unclear to me what ambiguity we would be removing by mentioning priority.
This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used". But since this is a deviation from the previous "include" logic (which would be more of a
force-include
since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.
Files can be excluded by specifying `files` as a dictionary separating | |
files to `include` from those to `exclude`. Files installed to the prefix | |
by host dependencies are automatically excluded when the include/exclude | |
syntax is used: | |
When defining `outputs/files` as a list, any file in the prefix (including those | |
installed by host dependencies) matching one of the glob expressions is | |
included in the output. Greater control over file matching may be | |
achieved by defining `files` as a dictionary separating files to | |
`include` from those to `exclude`. | |
When using include/exclude, only files installed by | |
the current recipe are considered. i.e. files in the prefix installed | |
by host dependencies are not matched. include/exclude may not be used | |
simultaneously with glob expressions listed directly in `outputs/files`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e.,
exclude
entries have higher priority as per the implementation.
exclude
has a higher priority than which operation?
The kind of precedence where if you do
files:
include:
- lib/libfoo.so
exclude:
- lib/libfoo.so
the exclusion wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now explained that when files match both exclude and include, they are excluded. d1b12a1
conda_build/build.py
Outdated
include = files.get("include", []) | ||
exclude = files.get("exclude", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax will be affected by the "key is set to an empty block, which means None in YAML" kind of issue. We should change it to:
include = files.get("include", []) | |
exclude = files.get("exclude", []) | |
include = files.get("include") or [] | |
exclude = files.get("exclude") or [] |
It might also be a good opportunity to sneak @jakirkham's changes added by #4971.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 1818:
- files = output.get("files", [])
+ files = output.get("files") or []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that files.get("include")
may return None
, but include
must be a list. So we use the falsiness of None
to replace it with []
. Equivalent to:
include = files['include'] if ('include' in files and files['include'] != None) else []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have
- output: something
files:
include:
- path
exclude:
- not-this-path # [linux]
and then you'll have files["exclude"] = None
on macOS, for example (linux is false). Not sure at what point we are ensuring that include
is a list.
Under that circumstance files.get("exclude", [])
will be None
instead of the expected []
. Is that what you mean? I think we are saying the same 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We are saying the same thing. I'm just trying to figure out if we should be doing something stronger than converting Falsy values into the empty list. Perhaps convert any non-list into a list? Something like:
include = files.get("include") if isinstance(files.get("include"), list) else []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might silence errors when users populate it with a dict or something accidentally. I think we can just play it simple with falsy promotions.
conda_build/build.py
Outdated
@@ -2817,7 +2848,7 @@ def build( | |||
# This is wrong, files has not been expanded at this time and could contain | |||
# wildcards. Also well, I just do not understand this, because when this | |||
# does contain wildcards, the files in to_remove will slip back in. | |||
if "files" in output_d: | |||
if "files" in output_d and not isinstance(output_d["files"], dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "files" in output_d and not isinstance(output_d["files"], dict): | |
if not isinstance(output_d.get("files") or (), dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this condition to:
if (
"files" in output_d
and output_d["files"] is not None
and not isinstance(output_d["files"], dict)
):
|
Co-authored-by: Daniel Ching <carterbox@users.noreply.github.com>
@carterbox I've started to implement this idea in Would love your feedback once it's in a testable state. This is also in preparation of a "top-level" cache build (that can be split up into multiple packages). |
Thanks, @isuruf for fixing my platform detection logic in the tests. @jaimergp, Looks like the failed test on Linux is in a test unrelated to this PR. Maybe someone can restart the tests to see if it is ephemeral or not? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good, just a minor suggestion and a question about deprecating the old behavior
else: | ||
keep_files = { | ||
os.path.normpath(pth) | ||
for pth in utils.expand_globs(files, metadata.config.host_prefix) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the change in behavior above with only allowing new files to be included should the old behavior of including any file in the prefix be deprecated?
else: | |
keep_files = { | |
os.path.normpath(pth) | |
for pth in utils.expand_globs(files, metadata.config.host_prefix) | |
} | |
else: | |
keep_files = { | |
os.path.normpath(pth) | |
for pth in utils.expand_globs(files, metadata.config.host_prefix) | |
} | |
if keep_files - new_prefix_files: | |
deprecated.topic(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still useful for eg when building gcc_bootstrap
packages. Deprecating the implicit behaviour and adding a new flag would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the current behavior of including all files from the prefix for multi-output and only installed files from the prefix for single-output recipes is inconsistent and thus non-intuitive. rattler-build is adding a section called "always_include_files" for recipes that want to repackage artifacts from their host dependencies.
I'm favor of deprecating the old behavior, but I'm uncertain about any removal or replacement schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rattler-build is adding a section called "always_include_files" for recipes that want to repackage artifacts from their host dependencies.
This exists in conda already, and is in use in conda-forge. Main use-case for that from my POV is to be explicit about overwriting some CMake metadata if an incremental build adds more targets to an already-existing CMake file (example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can deprecate that behaviour in a separate PR too. This PR has been open for too long and I wouldn't want to test your patience further @carterbox <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 LGTM! Just wanted to understand the thoughts/plans for the old behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open an issue to capture this feedback, though I think with the documentation this PR added we are in a better place already.
Co-authored-by: Bianca Henderson <beeankha@gmail.com>
Co-authored-by: Ken Odegard <kodegard@anaconda.com>
Thanks @carterbox and everyone involved! Three months and 64 comments later, this one is in! |
refs: - conda/conda-build#5216 Signed-off-by: Marcel Bargull <marcel.bargull@udo.edu>
The expand_globs function from conda_build.utils logs an ERROR when a glob expression returns no matches, this is overly alarming because the user may now use negative glob expressions which they don't care if it returns empty or the user may want to use the same set of glob expressions for multiple platforms some of which may return empty on some platforms. conda#5216 conda#5455
Description
Ressurects #4197. Closes #4196.
The purpose of this PR is to make it easier to split packages into multiple outputs using glob expressions. This is accomplished in two ways:
lib/**/libfoo*
, but notlib/**/*.a
include/**/*
without pulling in the headers of other packages. It is already the behavior of a single-output recipe to ignore files added to the prefix by host dependencies, so I'm not sure why this feature didn't make it into multi-output recipes.These new behaviors are only used if the include/exclude keywords are used under the files key. The previous behavior is retained.
Checklist - did you ...
news
directory (using the template) for the next release's release notes?