Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication resync #6261

Open
BiniKP opened this issue Feb 5, 2025 · 7 comments · May be fixed by #6266
Open

Replication resync #6261

BiniKP opened this issue Feb 5, 2025 · 7 comments · May be fixed by #6266

Comments

@BiniKP
Copy link

BiniKP commented Feb 5, 2025

Version
{
"component": "core",
"version": "3.69.2",
"package": "pulpcore",
"module": "pulpcore.app",
"domain_compatible": true
}
{
"component": "ansible",
"version": "0.23.1",
"package": "pulp-ansible",
"module": "pulp_ansible.app",
"domain_compatible": false
}
{
"component": "container",
"version": "2.22.1",
"package": "pulp-container",
"module": "pulp_container.app",
"domain_compatible": false
}
{
"component": "deb",
"version": "3.5.0",
"package": "pulp_deb",
"module": "pulp_deb.app",
"domain_compatible": false
}
{
"component": "certguard",
"version": "3.69.2",
"package": "pulpcore",
"module": "pulp_certguard.app",
"domain_compatible": true
}
K8S installation with pulp-operator.

Describe the bug
During a replication, the last task got stuck, and I had to cancel it after several hours. But now, it is not trying to replicate that repository again, even though I deleted the repository tree created for it.

To Reproduce
Create a new pulp deployment, create an upstream-pulp (pulp client or api), run replica (pulp client or api). If anything fail, you'll not be able to force a new replica to download again the content.

Expected behavior
To have an option to force the download again from the same source.

Additional context
Even the pulp rpm sync using the repository and the remote created didn't work. It seems to download something but is not published at the end like the rest of the content.

@BiniKP
Copy link
Author

BiniKP commented Feb 5, 2025

Update:
If you delete the upstream-pulp through the API, remove all repositories, remotes, and distributions created by the replica, and run "pulp orphan cleanup" and "pulp repository reclaim --all" to clean up any remaining garbage, you will be able to create a new upstream-pulp and download everything successfully. (Additionally, I ran "pulp rpm prune-packages --all-repositories", but I am not sure if it had any effect on the solution.)

The problem, as you can imagine, is that I had to delete everything on the Pulp instance.

Also, there were no repositories from other plugins, but they will probably be affected if you apply this workaround in your environment.

@mdellweg
Copy link
Member

mdellweg commented Feb 6, 2025

Is there any chance you had some logging output of the failed sync and the failed reattempts? Is is even reproducibe?

@BiniKP
Copy link
Author

BiniKP commented Feb 6, 2025

Sorry, the logs of the kubernetes worker that ran the task are already rotated. The only thing I have is the record of the task in pulp:

 {
    "pulp_href": "/pulp/api/v3/task-groups/0194d0b3-3206-71af-9eae-1934b450d796/",
    "prn": "prn:core.taskgroup:0194d0b3-3206-71af-9eae-1934b450d796",
    "description": "Replication of pulp",
    "all_tasks_dispatched": true,
    "waiting": 0,
    "skipped": 0,
    "running": 0,
    "completed": 14,
    "canceled": 1,
    "failed": 0,
    "canceling": 0,
    "group_progress_reports": [],
    "tasks": [
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6170-71cf-96ee-3635e79475ff/",
        "prn": "prn:core.task:0194d0b3-6170-71cf-96ee-3635e79475ff",
        "pulp_created": "2025-02-04T11:23:24.401105Z",
        "pulp_last_updated": "2025-02-04T11:23:24.401121Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.655800Z",
        "started_at": "2025-02-04T11:23:22.743902Z",
        "finished_at": "2025-02-04T11:25:49.629534Z",
        "worker": "/pulp/api/v3/workers/0194d09a-1ade-7799-ad21-661303e355d4/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-611f-7e52-acc9-7ecdf97f18c2/",
        "prn": "prn:core.task:0194d0b3-611f-7e52-acc9-7ecdf97f18c2",
        "pulp_created": "2025-02-04T11:23:24.320744Z",
        "pulp_last_updated": "2025-02-04T11:23:24.320759Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.654156Z",
        "started_at": "2025-02-04T11:23:25.411986Z",
        "finished_at": "2025-02-04T11:25:54.884738Z",
        "worker": "/pulp/api/v3/workers/0194d099-da58-717f-8f3f-c51c3cca96da/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-5e7f-7386-9794-c3c9f3e19a3a/",
        "prn": "prn:core.task:0194d0b3-5e7f-7386-9794-c3c9f3e19a3a",
        "pulp_created": "2025-02-04T11:23:23.648596Z",
        "pulp_last_updated": "2025-02-04T11:23:23.648611Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.652379Z",
        "started_at": "2025-02-04T11:23:23.882758Z",
        "finished_at": "2025-02-04T11:47:49.163193Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-61d8-73a9-9a26-997ef7405351/",
        "prn": "prn:core.task:0194d0b3-61d8-73a9-9a26-997ef7405351",
        "pulp_created": "2025-02-04T11:23:24.505119Z",
        "pulp_last_updated": "2025-02-04T11:23:24.505135Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T12:08:59.429739Z",
        "started_at": "2025-02-04T12:08:58.346659Z",
        "finished_at": "2025-02-04T12:08:58.761941Z",
        "worker": "/pulp/api/v3/workers/0194d09a-1ade-7799-ad21-661303e355d4/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6235-7d4c-8f39-bf65ff5ae50e/",
        "prn": "prn:core.task:0194d0b3-6235-7d4c-8f39-bf65ff5ae50e",
        "pulp_created": "2025-02-04T11:23:24.598421Z",
        "pulp_last_updated": "2025-02-04T11:23:24.598437Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T14:58:46.913330Z",
        "started_at": "2025-02-04T14:58:45.782934Z",
        "finished_at": "2025-02-04T14:58:46.180092Z",
        "worker": "/pulp/api/v3/workers/0194d09a-1ade-7799-ad21-661303e355d4/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-621b-70cf-8039-2641d223b52f/",
        "prn": "prn:core.task:0194d0b3-621b-70cf-8039-2641d223b52f",
        "pulp_created": "2025-02-04T11:23:24.572529Z",
        "pulp_last_updated": "2025-02-04T11:23:24.572552Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "canceled",
        "unblocked_at": "2025-02-04T11:23:22.659306Z",
        "started_at": "2025-02-04T11:25:55.090467Z",
        "finished_at": "2025-02-04T14:58:52.418323Z",
        "worker": "/pulp/api/v3/workers/0194d099-da58-717f-8f3f-c51c3cca96da/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-60e4-7436-8d7b-d53d70e9e6bf/",
        "prn": "prn:core.task:0194d0b3-60e4-7436-8d7b-d53d70e9e6bf",
        "pulp_created": "2025-02-04T11:23:24.261392Z",
        "pulp_last_updated": "2025-02-04T11:23:24.261414Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:47:50.781742Z",
        "started_at": "2025-02-04T11:54:15.199788Z",
        "finished_at": "2025-02-04T11:54:15.712694Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-320c-7c93-9998-9e69fe0df80a/",
        "prn": "prn:core.task:0194d0b3-320c-7c93-9998-9e69fe0df80a",
        "pulp_created": "2025-02-04T11:23:12.269459Z",
        "pulp_last_updated": "2025-02-04T11:23:12.269473Z",
        "name": "pulpcore.app.tasks.replica.replicate_distributions",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:12.292854Z",
        "started_at": "2025-02-04T11:23:13.898680Z",
        "finished_at": "2025-02-04T11:23:25.291514Z",
        "worker": "/pulp/api/v3/workers/0194d099-da58-717f-8f3f-c51c3cca96da/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6136-784a-9f47-9d98140d57a5/",
        "prn": "prn:core.task:0194d0b3-6136-784a-9f47-9d98140d57a5",
        "pulp_created": "2025-02-04T11:23:24.343720Z",
        "pulp_last_updated": "2025-02-04T11:23:24.343735Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:54:17.258236Z",
        "started_at": "2025-02-04T12:02:19.178708Z",
        "finished_at": "2025-02-04T12:02:19.585775Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6288-7afc-8d65-3f426d53f1af/",
        "prn": "prn:core.task:0194d0b3-6288-7afc-8d65-3f426d53f1af",
        "pulp_created": "2025-02-04T11:23:24.681764Z",
        "pulp_last_updated": "2025-02-04T11:23:24.681778Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T14:58:47.411701Z",
        "started_at": "2025-02-04T14:58:47.497664Z",
        "finished_at": "2025-02-04T14:58:47.889396Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-62d4-7025-8cc8-1b81721235df/",
        "prn": "prn:core.task:0194d0b3-62d4-7025-8cc8-1b81721235df",
        "pulp_created": "2025-02-04T11:23:24.757572Z",
        "pulp_last_updated": "2025-02-04T11:23:24.757587Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T14:58:46.702105Z",
        "started_at": "2025-02-04T14:58:47.986408Z",
        "finished_at": "2025-02-04T14:58:48.371997Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6273-7c5b-a931-e94d44979026/",
        "prn": "prn:core.task:0194d0b3-6273-7c5b-a931-e94d44979026",
        "pulp_created": "2025-02-04T11:23:24.660079Z",
        "pulp_last_updated": "2025-02-04T11:23:24.660094Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.660925Z",
        "started_at": "2025-02-04T11:47:49.341087Z",
        "finished_at": "2025-02-04T11:54:15.013271Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-62be-7958-a6ca-0729fca0ec64/",
        "prn": "prn:core.task:0194d0b3-62be-7958-a6ca-0729fca0ec64",
        "pulp_created": "2025-02-04T11:23:24.735466Z",
        "pulp_last_updated": "2025-02-04T11:23:24.735481Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.662582Z",
        "started_at": "2025-02-04T11:54:15.834233Z",
        "finished_at": "2025-02-04T12:02:19.040620Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-61c0-79e5-a533-ac635332c900/",
        "prn": "prn:core.task:0194d0b3-61c0-79e5-a533-ac635332c900",
        "pulp_created": "2025-02-04T11:23:24.481740Z",
        "pulp_last_updated": "2025-02-04T11:23:24.481755Z",
        "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
        "state": "completed",
        "unblocked_at": "2025-02-04T11:23:22.657394Z",
        "started_at": "2025-02-04T11:25:49.766340Z",
        "finished_at": "2025-02-04T12:08:58.141099Z",
        "worker": "/pulp/api/v3/workers/0194d09a-1ade-7799-ad21-661303e355d4/"
      },
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d0b3-6187-78de-9ede-3a64af197145/",
        "prn": "prn:core.task:0194d0b3-6187-78de-9ede-3a64af197145",
        "pulp_created": "2025-02-04T11:23:24.424153Z",
        "pulp_last_updated": "2025-02-04T11:23:24.424168Z",
        "name": "pulpcore.app.tasks.base.general_create",
        "state": "completed",
        "unblocked_at": "2025-02-04T12:02:18.443609Z",
        "started_at": "2025-02-04T12:02:19.688538Z",
        "finished_at": "2025-02-04T12:02:20.097750Z",
        "worker": "/pulp/api/v3/workers/0194d09a-6fc3-76aa-9dfd-2ec6dc2b831d/"
      }
    ]
  },

The task cancelled is this one:

{
    "pulp_href": "/pulp/api/v3/tasks/0194d0b3-621b-70cf-8039-2641d223b52f/",
    "prn": "prn:core.task:0194d0b3-621b-70cf-8039-2641d223b52f",
    "pulp_created": "2025-02-04T11:23:24.572529Z",
    "pulp_last_updated": "2025-02-04T11:23:24.572552Z",
    "state": "canceled",
    "name": "pulp_rpm.app.tasks.synchronizing.synchronize",
    "logging_cid": "2ac4fd85150a40eaa930c5bcc367067f",
    "created_by": "/pulp/api/v3/users/1/",
    "unblocked_at": "2025-02-04T11:23:22.659306Z",
    "started_at": "2025-02-04T11:25:55.090467Z",
    "finished_at": "2025-02-04T14:58:52.418323Z",
    "error": null,
    "worker": "/pulp/api/v3/workers/0194d099-da58-717f-8f3f-c51c3cca96da/",
    "parent_task": "/pulp/api/v3/tasks/0194d0b3-320c-7c93-9998-9e69fe0df80a/",
    "child_tasks": [],
    "task_group": "/pulp/api/v3/task-groups/0194d0b3-3206-71af-9eae-1934b450d796/",
    "progress_reports": [
      {
        "message": "Downloading Metadata Files",
        "code": "sync.downloading.metadata",
        "state": "completed",
        "total": null,
        "done": 6,
        "suffix": null
      },
      {
        "message": "Skipping Packages",
        "code": "sync.skipped.packages",
        "state": "completed",
        "total": 0,
        "done": 0,
        "suffix": null
      },
      {
        "message": "Parsed Packages",
        "code": "sync.parsing.packages",
        "state": "completed",
        "total": 13787,
        "done": 13787,
        "suffix": null
      },
      {
        "message": "Parsed Comps",
        "code": "sync.parsing.comps",
        "state": "completed",
        "total": 41,
        "done": 41,
        "suffix": null
      },
      {
        "message": "Parsed Advisories",
        "code": "sync.parsing.advisories",
        "state": "completed",
        "total": 4808,
        "done": 4808,
        "suffix": null
      },
      {
        "message": "Associating Content",
        "code": "associating.content",
        "state": "running",
        "total": null,
        "done": 17502,
        "suffix": null
      },
      {
        "message": "Downloading Artifacts",
        "code": "sync.downloading.artifacts",
        "state": "running",
        "total": null,
        "done": 13786,
        "suffix": null
      }
    ],
    "created_resources": [],
    "reserved_resources_record": [
      "prn:rpm.rpmrepository:0194d0b3-61ff-7a9d-93ef-1d1d425b61c4",
      "shared:prn:rpm.rpmremote:0194d0b3-61f0-7b9d-a517-95cbe3ddb5c2",
      "shared:prn:core.domain:0194d06f-3c81-7b10-b04f-288c85446af5"
    ]
  }
]

As you can see, the task was running and, for near 2 hours, was stuck in this state.
The following replica request was this one:

 {
    "pulp_href": "/pulp/api/v3/task-groups/0194d186-5879-7a79-8990-8fb3953007e0/",
    "prn": "prn:core.taskgroup:0194d186-5879-7a79-8990-8fb3953007e0",
    "description": "Replication of pulp",
    "all_tasks_dispatched": true,
    "waiting": 0,
    "skipped": 0,
    "running": 0,
    "completed": 1,
    "canceled": 0,
    "failed": 0,
    "canceling": 0,
    "group_progress_reports": [],
    "tasks": [
      {
        "pulp_href": "/pulp/api/v3/tasks/0194d186-587f-7d54-8947-b74f642288fd/",
        "prn": "prn:core.task:0194d186-587f-7d54-8947-b74f642288fd",
        "pulp_created": "2025-02-04T15:13:50.208566Z",
        "pulp_last_updated": "2025-02-04T15:13:50.208581Z",
        "name": "pulpcore.app.tasks.replica.replicate_distributions",
        "state": "completed",
        "unblocked_at": "2025-02-04T15:13:50.232040Z",
        "started_at": "2025-02-04T15:13:49.097432Z",
        "finished_at": "2025-02-04T15:13:52.786248Z",
        "worker": "/pulp/api/v3/workers/0194d09a-1ade-7799-ad21-661303e355d4/"
      }
    ]
  },

There is no other task in the task group and no errors on the logs as far as I can see.

I was able to reproduce the issue twice. The first one after the creation of the k8s cluster and the second one after destruction and redeployment. The third one, when it was able to replicate, was after manually delete everything in the pulp cluster as described in the main post.

Sorry for the long post and low details.

@mdellweg
Copy link
Member

mdellweg commented Feb 6, 2025

No worries. I fear however I cannot deduce more information from it.
Random thoughts: A sync task can take several hours depending on the amount of stuff needed to sync. It seems like the second attempt did not even start a sync task. Maybe there's a hole in the "no changes detected, skip update" logic.

@mdellweg
Copy link
Member

mdellweg commented Feb 6, 2025

OK, looking into this, I can confirm that the replica optimization logic fails here.
Either updating the upstream distribution, or recreating the UpstreamPulp object (in fact, clearing the last_replication field should suffice) should allow for resynching.

mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 6, 2025
@mdellweg mdellweg linked a pull request Feb 6, 2025 that will close this issue
mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 6, 2025
mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 6, 2025
@BiniKP
Copy link
Author

BiniKP commented Feb 7, 2025

OK, looking into this, I can confirm that the replica optimization logic fails here. Either updating the upstream distribution, or recreating the UpstreamPulp object (in fact, clearing the last_replication field should suffice) should allow for resynching.

While investigating the functionality of UpstreamsPulp, I noticed that there is no "destroy" option for it in the Pulp client, but it does exist in the API. I unfortunately discovered the hard way that using it will orphan everything replicated on the server, and you will likely have to destroy the cluster if you want to continue. I'm not sure if this is a known issue or if it could affect your modifications.

mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 7, 2025
@mdellweg
Copy link
Member

mdellweg commented Feb 7, 2025

Yes, makeing it an exact clone (deleting everything else in the domain) is part of the design. I just figured, i cannot find any documentation either.
Also there is work going on improving the deletion options: #6247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants