Skip to content

Conversation

@adrianimboden
Copy link

@adrianimboden adrianimboden commented Oct 2, 2025

Hi rules_python team

I use your rules for a long time now. With WORKSPACE style, I use pip as follows:

pip_parse(
    name = "py_deps",
    extra_pip_args = [
        "--index-url=/home/user/local_pip_mirror",
        "--no-cache-dir",
    ],
    python_interpreter = "python3",
    python_interpreter_target = interpreter,
    requirements_lock = ":requirements_lock.txt",
)

The folder /home/user/local_pip_mirror gets populated with pypi-mirror download --requirement requirements_lock.txt

So I have a nice and clean way to use offline build. I could never get it running directly with bazel fetch and stuff. But this solution was very nice because no internet was involved at all.

I am in the way of upgrading to bzlmod. I saw many bug reports for making offline build work, but I honestly gave up with bazel vendor and stuff again. The simplest way in my opinion is to just use a local pip mirror. For that I tried the following:

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(         
    experimental_index_url = "/home/user/local_pip_mirror",
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)                        
use_repo(pip, "py_deps")

at the moment, experimental_index_url must be a https:// url to work.

This MR changes it so that local paths are also a possible experimental_index_url.

For my project, the proposed changes are in effect and working great.

It is not perfect, but I think it is an important addition to aid the bzlmod migration.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @adrianimboden, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature that allows the experimental_index_url parameter within rules_python's pip.parse extension to accept local file system paths, in addition to traditional HTTPS URLs. This enhancement is designed to facilitate robust offline dependency resolution by enabling the use of local pip mirrors, which is particularly beneficial for users migrating to bzlmod and seeking a more flexible and reliable method for managing Python dependencies without requiring internet access during the build process.

Highlights

  • Local Paths for experimental_index_url: The experimental_index_url parameter in pip.parse now supports local file system paths, allowing users to specify local pip mirrors for package resolution.
  • Enhanced Offline Build Support for bzlmod: This change significantly improves support for offline builds when using bzlmod by enabling the use of local pip mirrors, addressing a common challenge for users migrating from WORKSPACE to bzlmod.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a useful feature to enable local paths for pip indexes, which is great for offline builds. However, the current implementation introduces a few critical issues and regressions. I've found a critical regression in simpleapi_download.bzl that breaks fetching from multiple indexes. Additionally, the logic for handling local paths in both modified files has several bugs, including incorrect URL detection, lack of error handling, and a crash-inducing bug in whl_library.bzl due to a missing sha256 field. I've provided detailed comments and suggestions to address these issues, including a significant refactoring for whl_library.bzl to use a more robust approach with file:// URLs.

Comment on lines 298 to 320
if len(urls) == 1 and not urls[0].startswith("https://"):
local_url = urls[0]
local_path = rctx.path(local_url)

if local_path.is_dir:
rctx.file(filename, rctx.read(local_url + "index.html"))
result = struct(
output = filename,
success = True,
)
else:
rctx.file(filename, rctx.read(local_url))
result = struct(
output = filename,
success = True,
)
else:
result = rctx.download(
url = urls,
output = filename,
sha256 = rctx.attr.sha256,
auth = get_auth(rctx, urls),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block for handling local paths has several issues, including incorrect logic for wheel URLs (the is_dir check), a missing sha256 in the result which will cause a crash, a too-broad condition for identifying local paths, and lack of error handling. A much simpler and more robust approach is to convert the local path to a file:// URL and use rctx.download, which correctly handles all these aspects.

        download_urls = urls
        if len(urls) == 1 and not (urls[0].startswith("https://") or urls[0].startswith("http://")):
            # It's a local path, convert to a file:// URL for ctx.download
            abs_path = rctx.path(urls[0]).realpath
            download_urls = ["file://" + abs_path]

        result = rctx.download(
            url = download_urls,
            output = filename,
            sha256 = rctx.attr.sha256,
            auth = get_auth(rctx, download_urls),
        )

Comment on lines 220 to 238
if not real_url.startswith("https://"):
normalized_url = real_url
local_path = ctx.path(normalized_url)

def local_read_index_result():
if local_path.is_dir:
ctx.file(output, ctx.read(normalized_url + "index.html"))
return struct(
output = output,
success = True,
)
else:
ctx.file(output, ctx.read(normalized_url))
return struct(
output = output,
success = True,
)

download = struct(wait = lambda: local_read_index_result())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic for handling local paths has a few issues:

  1. The condition not real_url.startswith("https://") is too broad. It will incorrectly treat http:// URLs as local paths.
  2. The local_read_index_result function lacks error handling. If ctx.read fails because a file doesn't exist, it will cause the entire rule to fail, which prevents the logic in simpleapi_download from trying other index URLs.
  3. The implementation of local_read_index_result can be simplified to avoid code duplication.

Here is a suggested implementation that addresses these points:

Suggested change
if not real_url.startswith("https://"):
normalized_url = real_url
local_path = ctx.path(normalized_url)
def local_read_index_result():
if local_path.is_dir:
ctx.file(output, ctx.read(normalized_url + "index.html"))
return struct(
output = output,
success = True,
)
else:
ctx.file(output, ctx.read(normalized_url))
return struct(
output = output,
success = True,
)
download = struct(wait = lambda: local_read_index_result())
if not (real_url.startswith("https://") or real_url.startswith("http://")):
normalized_url = real_url
local_path = ctx.path(normalized_url)
def local_read_index_result():
path_to_read = normalized_url
if local_path.is_dir:
path_to_read += "index.html"
if not ctx.path(path_to_read).exists:
return struct(success = False)
ctx.file(output, ctx.read(path_to_read))
return struct(
output = output,
success = True,
)
download = struct(wait = lambda: local_read_index_result())

)
if len(urls) == 1 and not urls[0].startswith("https://"):
local_url = urls[0]
local_path = rctx.path(local_url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the whl_library should consume locally available paths through labels, because then it makes things easier to thread through and whl_library already accepts a label. Would it be possible to create a repository_rule that exposes these paths as labels?

@aignas
Copy link
Collaborator

aignas commented Oct 2, 2025

This looks interesting - we certainly have discussed about using local path for the experimental_index_url in the past. A few things that pop to my mind:

  • Having a unit test that ensures that this feature works would be great. They can be added in tests/pypi/simpleapi_download directory.

How do you update the local mirror?

@adrianimboden
Copy link
Author

I populate the mirror like this:

pypi-mirror download --requirement requirements_lock.txt --download-dir /tmp/download
pypi-mirror create --download-dir /tmp/download --mirror-dir /path/to/mirror --copy

The folder looks like this then:

├── index.html
├── aiohappyeyeballs
│   ├── aiohappyeyeballs-2.6.1-py3-none-any.whl
│   └── index.html
├── aiohttp
│   ├── aiohttp-3.12.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│   └── index.html
├── aiosignal
│   ├── aiosignal-1.4.0-py3-none-any.whl
│   └── index.html

I am not sure about the label stuff. Did you think about something like this?

new_local_repository = use_repo_rule("@bazel_tools//tools/build_defs/repo:local.bzl", "new_local_repository")

new_local_repository(
    name = "pip_deps_mirror",
    build_file_content = "exports_files(['**'])",
    path = "/home/thingdust/deps/pip_deps",
)

pip = use_extension("@rules_python//python/extensions:pip.bzl", "pip")
pip.parse(
    experimental_index_url = Label("@pip_deps_mirror"),
    hub_name = "py_deps",
    python_version = python_version,
    requirements_lock = "//:requirements_lock.txt",
)
use_repo(pip, "py_deps")

I am not sure how easy that will be. Seems like a complete new codepath to me when there may be an URL or a label. Or do I miss something?

@adrianimboden adrianimboden force-pushed the allow-local-paths-for-experimental-index-url branch from 104d221 to 4b93661 Compare October 2, 2025 08:40
@adrianimboden adrianimboden force-pushed the allow-local-paths-for-experimental-index-url branch from 4b93661 to f7cee05 Compare October 2, 2025 08:42
@adrianimboden
Copy link
Author

after looking at it again, I saw that there is a much simpler solution to make it better in the meantime.

I wrongly assumed that the download functions don't work for local files. I always had the problem that file:// urls did not work. I found out that this is because the urls get normalized first. A small addition to strip_empty_path_segments makes it work with local urls.

Making it work with labels would be nice tough. Probably for another time?

@adrianimboden
Copy link
Author

It did not work before because strip_empty_path_segments does the following: file:///path/to/folder -> file://path/to/folder. For local paths, empty segments should not be a problem I presume.

@rickeylev
Copy link
Collaborator

I really like the idea of being able to point to a local path using a label, for several reasons.

It'd be really convenient for testing our pip integration -- we can easily construct arbitrary index states and have a more end-to-end verification.

It also seems like a really flexible and powerful way for customizing where pip is getting stuff from. You could write a repo rule to make the pip index look however you want, and be populated however you want.

@groodt
Copy link
Collaborator

groodt commented Oct 3, 2025

Yes, this sort of thing is great. It's often called a "wheel house" and is a very common and useful pattern for offline builds, avoiding sdist in deployment scenarios, etc. Very supportive of this. Tools in a similarish space are: https://github.com/chriskuehl/dumb-pypi

@aignas
Copy link
Collaborator

aignas commented Oct 5, 2025

Thinking out loud a little bit how this could be designed. This might be a train of thought but I'll just right it out as I think.

  1. The idea of passing in a local path or something sounds good, but so far in bazel I've seen this work only if you pass an absolute path or a label. Hence I thought it would be nice to pass a label.
  2. If one has labels for each whl file, then we can pass them to whl_library whl_file attribute: https://rules-python.readthedocs.io/en/latest/api/rules_python/python/private/pypi/whl_library.html#whl_library.whl_file
  3. This means that the code in parse_requirements.bzl needs to inject those labels in some way.
  4. In the future we may want to write the URLs into the lock file, so if they have absolute file:/// in them, this will not age well, so it is best to treat the local index as one that has the right format.
  5. parse_requirements.bzl is called from hub_builder and gets the get_index_urls function as a parameter. We could have a separate implementation of that that returns labels instead of URLs, however, the label mapping should be present there is some way.
  6. If pip.parse can create a local index repository on the fly (i.e. repository where we can access whls by using a scheme of @local_index_repo_name//<whl_name>:<file_name>.whl). The extension reads the local directory structure and finds all whl files, then creates a repo and passes the whls as a list of paths/labels. The HTML files are only processed in the extension to avoid the circular dependencies in the extension/starlark evaluation.

So to sum up, the files that would need to be touched:

  • whl_library - stays the same.
  • hub_builder.bzl - needs some extra handling of a different get_index_urls function. It should handle the case well where the whl (or dist) struct has whl_file but does not have url set.
  • parse_requirements.bzl - needs some minor fixing to accommodate a more generic getting of the wheels.
  • local_whl_repo.bzl - a new repository that contains the files.
  • simpleapi_local.bzl - a new file that handles the traversing the local index.html tree.

There are probably ways to optimize this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants