-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not validate_duplicate_content for APT repositories #633
Conversation
closes pulp#632 Using validate_duplicate_content prevents successfull synchronization when multiple package indices contain references to the same package. However, this is an entirely legitimate situation for separate distributions/releases within a single APT repo. Further more, the so created duplicate package content, does not cause any adverse effects, since it is actually identical and only one of each duplicate will be saved. Note that this issue was masked prior to the introduction of the optimize sync feature, since we were inadvertently using the Repository class from pulpcore instead of the AptRepository class from pulp_deb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍
Does not require backporting, since the issue is masked on all existing release branches. |
@@ -67,7 +67,7 @@ def finalize_new_version(self, new_version): | |||
from pulp_deb.app.tasks.exceptions import DuplicateDistributionException | |||
|
|||
remove_duplicates(new_version) | |||
validate_repo_version(new_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not understand this. But the validate_duplicate_content
should prevent two content units with the same "natural almost key" but different artifacts to enter the same repository version. It is only checked for content that declares repo_key_fields
. So maybe you can revisit those attributes and relax the rule based on the content type.
I would still think it is invalid to have two packages with the same (name, version, architecture)
in one repo-version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we found, is that it also appears to prevent syncs were two package indices both reference the same pool package, because it somehow counts those as duplicate content units at the time of the check. This is odd since it does not actually result in that package being saved twice and colliding on the publish. This is observed behavior, I am not certain I fully understand why this happens. (I feel like what should happen is that the first of these two declarative_content units is saved to the DB, and the second one then sees that this package already exists and does not save anything, resulting in no duplicates in the new repo version, but that is not what we are seeing...)
Not sure if completely disabling the check is a good solution, but it looks like the check was not running correctly anyway until we introduced optimize sync, so this seemed like the least disruptive stopgap measure to avoid breaking syncs that have so far been working without issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Yes, that should not happen. Can you maybe find a (unittest like) reproducer for this that creates a repo-version and adds the same content twice and then we may see, what's actually happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a follow on task, so we do not forget about this: #640
closes #632
Using validate_duplicate_content prevents successfull synchronization
when multiple package indices contain references to the same package.
However, this is an entirely legitimate situation for separate
distributions/releases within a single APT repo. Further more, the so
created duplicate package content, does not cause any adverse effects,
since it is actually identical and only one of each duplicate will be
saved.
Note that this issue was masked prior to the introduction of the
optimize sync feature, since we were inadvertently using the Repository
class from pulpcore instead of the AptRepository class from pulp_deb.