Add support for listing HTTP cached packages in pip cache list #13587

a-sajjad72 · 2025-09-20T19:17:21Z

This PR implements support for listing HTTP cached packages in pip cache list, addressing issue #10460.

Problem

Currently, pip cache list only shows locally built wheels stored in the wheels/ cache directory, but ignores HTTP cached packages stored in the http-v2/ or http directory. This leads to confusing behavior where users see "No locally built wheels cached" even when pip has cached wheel files from PyPI downloads.

$ pip cache info
Package index page cache size: 89 kB
Number of HTTP files: 6
Locally built wheels size: 0 bytes
Number of locally built wheels: 0

$ pip cache list
No locally built wheels cached.  # Misleading - there are cached packages!

Solution

This PR extends pip cache list to parse HTTP cached responses and extract package information from PyPI's x-pypi-file-* headers, as suggested in the issue discussion. The implementation:

Parses HTTP cache files using the existing cachecontrol.Serializer to deserialize cached responses
Extracts package metadata from PyPI headers (x-pypi-file-project, x-pypi-file-version, etc.)
Displays HTTP cached packages alongside wheel files with clear labeling
Maintains backward compatibility while adding new functionality

New Features

`--cache-type` Option

Users can now control which cache types to list:

pip cache list                    # Show all cached packages (default)
pip cache list --cache-type=http  # Show only HTTP cached packages  
pip cache list --cache-type=wheels # Show only locally built wheels

Enhanced Output

HTTP cached packages are clearly labeled and show accurate size information:

$ pip cache list
Cache contents:

 - requests-2.32.5-py3-none-any.whl (65 kB) [HTTP cached]
 - colorama-0.4.6-py3-none-any.whl (26 kB) [HTTP cached]

Pattern Matching

Pattern matching now works with HTTP cached packages:

$ pip cache list requests
Cache contents:

 - requests-2.32.5-py3-none-any.whl (65 kB) [HTTP cached]

Implementation Details

Minimal changes: Only modifies src/pip/_internal/commands/cache.py
Robust error handling: Silently skips corrupted or non-wheel cache files
Efficient processing: Only deserializes cache files when needed
Proper size calculation: Includes both header and body file sizes
CLI validation: Validates new option choices at the command line level

Testing

The implementation has been thoroughly tested with:

Real pip cache data from package installations
Pattern matching with various package names
Error scenarios (corrupted cache files, invalid options)
Empty cache conditions
Mixed cache type scenarios
Help text and CLI option validation

This change significantly improves the user experience for cache management by providing complete visibility into pip's caching behavior.

Feedback & Discussion

All suggestions, reviews, and discussions are welcome. If there are any concerns about naming, option design, or consistency with the existing pip CLI and API, I am happy to refactor or adjust the implementation. The goal is to make this feature both intuitive for end users and maintainable for contributors going forward.

Related Issues

Closes #10460.

This PR directly addresses the problem reported in #10460. If there are other related issues that overlap with this functionality, please feel free to reference them here so they can be resolved by this change as well.

… cases.

notatallshaw · 2025-09-20T20:05:15Z

Hi @a-sajjad72, thanks for submitting a PR to pip, please be aware all pip maintainers are currently supporting pip on a volunteer basis and therefore it may be some time before someone can review.

That said I have an early comment:

This PR extends pip cache list to parse HTTP cached responses and extract package information from PyPI's x-pypi-file-* headers, as suggested in the issue discussion.

Pip will not accept a PyPI specific implementation, as it's not a Python packaging standard it won't work on arbitrary indexes and there is no guarantee PyPI will continue to support it in the future.

a-sajjad72 · 2025-09-21T23:49:33Z

Hi @notatallshaw , thanks for the early feedback.

I understand your concern about the current approach being considered “PyPI-specific” because it relies on the x-pypi-file-* headers. My intention wasn’t to hard‑code behavior for PyPI, but I see now that depending on those headers for core functionality effectively ties the feature to PyPI.

I do have an alternative, more index-agnostic idea in mind that would not depend on those headers as the primary source. I suggest we let an initial review happen first (so I know if there are any broader objections), and then we can discuss whether shifting to that alternative approach is the right next step.

If you prefer, I can outline that alternative sooner. Just let me know.

Thanks again for the clarification and your time. Let me know how you would like to proceed.

notatallshaw · 2025-09-22T00:07:55Z

For myself, I won't be reviewing this PR while it is tied to PyPI specific features, as I would not accept it, and I don't know how much of a change is required to make it index agnostic. Though I won't speak for other maintainers.

a-sajjad72 · 2025-09-22T00:48:05Z

Thanks again. I’ll convert this PR to Draft and refactor it to be index-agnostic before asking for further review.

Planned minimal first step:

Generic wheel detection (ZIP magic + dist-info/METADATA Name and Version).
Skip non-artifact / HTML / JSON responses.
Use any x-pypi-file-* headers only as optional enrichment (never required).
Placeholder entry if name/version can’t be inferred (or simply skip if you prefer—let me know).
Keep HTTP listing behind the existing --cache-type flag initially.

If any maintainer would prefer an even smaller scope (e.g. wheels only, no placeholders), please let me know; otherwise I’ll proceed on this basis and update the PR description with a concise design note.

notatallshaw · 2025-09-22T00:56:05Z

If any maintainer would prefer an even smaller scope

I would advise that the scope be kept as small as possible while still providing a helpful user experience, to be more likely to be accepted. For example, I do not think there should be any use of PyPI only features, even as optional enrichment.

I'm sorry I can't contribute more to a design discussion right now, I don't have much experience here with the design of the cache. Which contributes to why a smaller scope will be easier for a maintainer to start a review.

pfmoore · 2025-09-22T07:19:50Z

I agree with everything @notatallshaw said. Furthermore, I’d like some discussion of the correctness of the whole approach. The HTTP cache is just that - a cache of HTTP requests, not a cache of downloaded files. The cache includes simple index responses and possibly other information pip has requested - presenting it as just holding wheels is misleading. Also, an index has no obligation to provide any information that a downloaded file comes from a wheel - so we know that accurate data is impossible to achieve, the best we can do is provide a guess. That guess will be accurate in many cases, but we should present it clearly as a guess, and not tempt people to rely on it.

Finally, I’m concerned about the cost of this. Wheels can be big. Have you done any testing of performance, on a large HTTP cache, with some big wheels (multiple copies of PyTorch would be a good start!) in it?

a-sajjad72 · 2025-09-28T20:16:10Z

Thanks @pfmoore for providing your insights on this.

The HTTP cache is just that - a cache of HTTP requests, not a cache of downloaded files. The cache includes simple index responses and possibly other information pip has requested - presenting it as just holding wheels is misleading.

Yeah, I totally agree with you that HTTP caches are just saved HTTP responses and also our required files cached wheels are one of them.

Also, an index has no obligation to provide any information that a downloaded file comes from a wheel - so we know that accurate data is impossible to achieve, the best we can do is provide a guess. That guess will be accurate in many cases, but we should present it clearly as a guess, and not tempt people to rely on it.

When I started working on it, I came to know that some of the cached directories contains responses that are .body and from which many of them are valid archive files.
As I tested, I found total 361 .body responses in the caches from which 134 were valid archive files. And these files include the sdists and bdists collectively.

Finally, I’m concerned about the cost of this. Wheels can be big. Have you done any testing of performance, on a large HTTP cache, with some big wheels (multiple copies of PyTorch would be a good start!) in it?

Yes I tested it, and it takes approximately the same time as pip cache info take`, maybe a slighter more. but wouldn't take much time. I will test it thoroughly with different metrics to ensure the time cost.

What will be revised approach?

The core of the revised approach is to identify packages from the .body responses in the HTTP cache, which I've found are often cached wheels (bdist) and source distributions (sdist).

More reliable metadata: Instead of parsing the METADATA file, I will extract the package name and version from the normalized .dist-info or .egg-info directory name. This is far more robust as it relies on a consistent, mandatory packaging standard.
Support for sdists: The revised approach will ensure that it handles binary distributions (bdists) as well as source distributions (sdists) formats. This provides a more comprehensive view of the cached packages.
Performance: I have verified that this method is very efficient, as it avoids extracting the full archive, even for very large files.
Unknown files: Any .body responses that are not valid wheel or sdist archives will be ignored, so the output will only contain reliably identified packages.

This approach offers a practical and significantly more reliable way to list cached packages without making incorrect assumptions about the cache's contents.

Please let me know, I will start working on it and update the PR's description.

a-sajjad72 added 4 commits September 20, 2025 23:50

Implement HTTP cached packages support in pip cache list

5b7c4e1

Update documentation and finalize HTTP cache list feature

1fc67f5

Fixed linting errors

747a1b2

Refactor cache command and the its test cases to resolve failing test…

eefc20d

… cases.

a-sajjad72 changed the title ~~New feature~~ Add support for listing HTTP cached packages in pip cache list Sep 20, 2025

a-sajjad72 added 2 commits September 21, 2025 00:28

style(cache): apply Black formatting to cache command

650f562

news: add 10460.feature fragment for HTTP cache listing

9f19e23

psf-chronographer bot added the bot:chronographer:provided label Sep 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for listing HTTP cached packages in pip cache list #13587

Add support for listing HTTP cached packages in pip cache list #13587

a-sajjad72 commented Sep 20, 2025

Uh oh!

notatallshaw commented Sep 20, 2025

Uh oh!

a-sajjad72 commented Sep 21, 2025

Uh oh!

notatallshaw commented Sep 22, 2025

Uh oh!

a-sajjad72 commented Sep 22, 2025

Uh oh!

notatallshaw commented Sep 22, 2025 •

edited

Loading

Uh oh!

pfmoore commented Sep 22, 2025

Uh oh!

a-sajjad72 commented Sep 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add support for listing HTTP cached packages in pip cache list #13587

Are you sure you want to change the base?

Add support for listing HTTP cached packages in pip cache list #13587

Conversation

a-sajjad72 commented Sep 20, 2025

Problem

Solution

New Features

--cache-type Option

Enhanced Output

Pattern Matching

Implementation Details

Testing

Feedback & Discussion

Related Issues

Uh oh!

notatallshaw commented Sep 20, 2025

Uh oh!

a-sajjad72 commented Sep 21, 2025

Uh oh!

notatallshaw commented Sep 22, 2025

Uh oh!

a-sajjad72 commented Sep 22, 2025

Uh oh!

notatallshaw commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfmoore commented Sep 22, 2025

Uh oh!

a-sajjad72 commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What will be revised approach?

Uh oh!

Uh oh!

`--cache-type` Option

notatallshaw commented Sep 22, 2025 •

edited

Loading

a-sajjad72 commented Sep 28, 2025 •

edited

Loading