Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pypi): lookup simple api first #31771

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

samgiz
Copy link

@samgiz samgiz commented Oct 3, 2024

Changes

This rewrites the getReleases function of pypi datasources:
First, the simple (pypi) api is queried, then the pypijson api is retried to enrich the metadata.

Previously the simple api was only used if pypijson returned 404.

The unit tests stayed mostly the same, but the simple api routes needed to be mocked. Also added testcases to check that both simple and json pypi apis are used, dependency resolution succeeds if json api is unavailable, and dependencies from the two endpoints get merged correctly.

Context

Building on top of #26505.
Closes #26483.

As pypijson is used additionally, #26383 is resolved as well.

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

  • Code inspection only, or
  • Newly added/modified unit tests, or
  • No unit tests but ran on a real repository, or
  • Both unit tests + ran on the pytorch pypi repository

lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
Co-authored-by: Rhys Arkins <rhys@arkins.net>
lib/modules/datasource/pypi/index.ts Outdated Show resolved Hide resolved
lib/modules/datasource/pypi/index.ts Show resolved Hide resolved
Comment on lines 73 to 76
if (simpleDependencies === null) {
// Both Simple and API lookups failed
throw err;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this throw an error or return null?

Copy link
Author

@samgiz samgiz Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete it for now, since I check for either of the values being null below. Let me know if this was actually expected behaviour.

@samgiz samgiz requested a review from rarkins October 17, 2024 11:41
@rarkins
Copy link
Collaborator

rarkins commented Oct 18, 2024

I just realized also that this PR unfortunately doesn't follow the approach in the rubygems datasource (see https://github.com/renovatebot/renovate/tree/main/lib/modules/datasource/rubygems) as was suggested in #26483

In short:

  • Use the simple index by default
  • Any time a new version is found which wasn't seen before, query the JSON API (if present) for metadata, and heavily cache it

In other words, the goal was to avoid hitting the JSON API every time.

@rarkins
Copy link
Collaborator

rarkins commented Oct 18, 2024

I expanded the description in #26483

#26383 needs a reproduction so it can be tested

@samgiz
Copy link
Author

samgiz commented Oct 18, 2024

#26383 needs a reproduction so it can be tested

I think that's covered by this test, it tests a 403 return value:

    it.each([404, 403])(
      'process data from simple api with pypijson unavailable',
...

@rarkins
Copy link
Collaborator

rarkins commented Oct 18, 2024

I meant a reproduction repo, not a unit test

@samgiz
Copy link
Author

samgiz commented Oct 18, 2024

I meant a reproduction repo

Something like this? https://github.com/samgiz/renovate-26383-reproducer

@samgiz
Copy link
Author

samgiz commented Oct 18, 2024

I just realized also that this PR unfortunately doesn't follow the approach in the rubygems datasource

Is the assumption here that the simple api is more expensive to call than the json api? That might be true, but not by a lot, and I'm not sure such an optimization is worth the extra complexity in the code.

This is a bit anecdotal since I only have one datapoint, but I just ran this:

import { PypiDatasource } from '.';
(new PypiDatasource()).getReleases({
  packageName: 'azure-cli-monitor',
  registryUrl: 'https://pypi.org/pypi/'
})

while logging the content-length of the json and simple endpoint responses, and got 9239 for simple and 14560 for json. I'm not very convinced that it's worth adding the cache for an optimisation of such size, unless I'm misunderstanding what it does. Though you likely have better judgement about this than I do.

It also seems like a lot of work to implement the cache, and I'm mostly just interested in the torch pypi repository working correctly with renovate. Would it make sense to reopen #26386 instead?

@rarkins
Copy link
Collaborator

rarkins commented Oct 19, 2024

Thanks for your patience and helping me understand it better. Based on your last suggestion I have implemented #32024

Please let me know if you think that solves the problem.

BTW I needed to make some config changes to the reproducer to get it to work: https://github.com/renovate-reproductions/26383/blob/master/renovate.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PyPI datasource fall back to simple API if JSON API fails
3 participants