Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimize mode for sync tasks #570

Merged
merged 1 commit into from
Aug 22, 2022

Conversation

hstct
Copy link
Contributor

@hstct hstct commented Jul 4, 2022

Required PR: pulp/pulpcore#2999

closes #564

Co-Authored-By: Quirin Pamp pamp@atix.de

This will change the default behaviour of the sync to use optimize
unless it is explicitly set to False.

@hstct hstct force-pushed the create-optimize-sync-option branch 3 times, most recently from 6a31aff to 6cf3a84 Compare July 4, 2022 07:52
@quba42
Copy link
Collaborator

quba42 commented Jul 4, 2022

I would adjust the structure of last_sync_details somewhat as follows:

last_sync_details:
  remote_options:
    distributions: "<value>"
    components: "<value>"
    architectures: "<value>"
    download_policy: "<value>"
  <distribution>:
    artifact_set_sha256: "<value>"
    <component>:
      <architecture>:
        artifact_set_sha256: "<value>"

Remote options belong on the top level, since they are the same for all distributions.
We also need to make sure download_policy is one of the remote options we check.
Rather than having a list of (relative_path, package_index_artifact_set_sha256) tuples we could just make the comonents and architectures nested dict keys just like the distribution. This is mostly a preference thing, and should not ultimately make much of a difference. (The relative_path, can be generated from the component and the architecture, so this contains essentially the same information).

@hstct hstct force-pushed the create-optimize-sync-option branch 4 times, most recently from f64a0d6 to 5c8f2bb Compare July 6, 2022 17:22
@mdellweg
Copy link
Member

mdellweg commented Jul 7, 2022

Let me just add another thought here. Can we not add information to the repository-version itself how it was made? This would be immutable and the repository can always look at it's latest version to retrieve the thing you are implementing here.

@quba42
Copy link
Collaborator

quba42 commented Jul 7, 2022

Can we not add information to the repository-version itself how it was made?

It is an interesting thought, which might have other benefits as well (like being able to debug later how really old repo versions came about). If we go this route, I kind of would like some buy in from pulp_rpm or even pulpcore though. The current implementation is analogous to the one in pulp_rpm and I do like keeping plugins in sync on similar things. I will bring this up in open floor on Tuesday.

@quba42
Copy link
Collaborator

quba42 commented Jul 7, 2022

I created a discourse discussion and posted it in #pulp-rpm for discussing the possibility of using repo versions instead of repos: https://discourse.pulpproject.org/t/optimize-sync-and-last-sync-details-in-pulp-rpm/531

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my suggestions just reflect small style changes.
Overall this looks pretty good to me.
There are still some details where I added a reminder to double check things.

The big question is whether we want to go ahead with the repository.last_sync_details design, or if we want to rework to use something like repo_version.creation_details instead. Unfortunately it does not make sense to punt down the road by accepting the implementation we have and then "maybe adjusting it later", since we should get these model changes right from the get go.

pulp_deb/app/serializers/repository_serializers.py Outdated Show resolved Hide resolved
pulp_deb/tests/functional/api/test_sync_optimize.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Outdated Show resolved Hide resolved
pulp_deb/app/tasks/synchronizing.py Show resolved Hide resolved
@hstct hstct force-pushed the create-optimize-sync-option branch from 5c8f2bb to 6a6bddd Compare July 11, 2022 15:27
@quba42 quba42 added the .feature CHANGES/<issue_number>.feature label Jul 13, 2022
@hstct hstct force-pushed the create-optimize-sync-option branch 3 times, most recently from e2c51f8 to b02d3c6 Compare July 19, 2022 11:28
@hstct
Copy link
Contributor Author

hstct commented Jul 22, 2022

The current version relies on pulp/pulpcore#2999 to be merged before the tests can pass.

@hstct hstct force-pushed the create-optimize-sync-option branch from a52b02e to a25d9dd Compare July 25, 2022 11:27
@quba42 quba42 force-pushed the create-optimize-sync-option branch 2 times, most recently from 53a6c99 to cf4b9c6 Compare August 2, 2022 13:40
@quba42 quba42 marked this pull request as ready for review August 2, 2022 13:41
@quba42 quba42 force-pushed the create-optimize-sync-option branch from cf4b9c6 to 08b64b5 Compare August 2, 2022 13:47
@quba42
Copy link
Collaborator

quba42 commented Aug 2, 2022

The latest version of this PR, exclusively stores remote options on the new RepositoryVersion field from pulpcore.
The needed artifact_set_sha256 comparison values are instead queried from the content in the previous repository version (this information has always been present there).

We have successfully tested skipping entire Release files as well as individual package indices. As such this change is now mostly ready for final review.

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now approve these changes, but since I have been more involved in the latest version as an author, I will look for another secondary review amongst my colleagues. 😉

@hstct hstct force-pushed the create-optimize-sync-option branch from f4447b8 to 51a0380 Compare August 10, 2022 11:23
@quba42 quba42 force-pushed the create-optimize-sync-option branch 2 times, most recently from 8a7322b to 74b5de1 Compare August 18, 2022 13:41
@quba42 quba42 force-pushed the create-optimize-sync-option branch from 74b5de1 to b93cde7 Compare August 18, 2022 14:18
closes pulp#564

Co-Authored-By: Quirin Pamp <pamp@atix.de>

This will change the default behaviour of the sync to use optimize unless it is explicitly set to False.
@quba42 quba42 force-pushed the create-optimize-sync-option branch from b93cde7 to bb26166 Compare August 22, 2022 09:33
@quba42
Copy link
Collaborator

quba42 commented Aug 22, 2022

After final internal review/debate, we have decided to merge the current state of this new feature (once the tests turn green).

@quba42 quba42 merged commit 14ad46e into pulp:main Aug 22, 2022
@quba42 quba42 deleted the create-optimize-sync-option branch August 22, 2022 09:52
Comment on lines -156 to +157
repository = Repository.objects.get(pk=repository_pk)
repository = AptRepository.objects.get(pk=repository_pk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up revealing a previously masked problem with our sanity checks.
We fixed it again using: #633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.feature CHANGES/<issue_number>.feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optimize mode to repository sync
3 participants