Skip to content

Runfiles API: obtaining the symlink path within runfiles_dir #24929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jwnimmer-tri opened this issue Dec 12, 2024 · 8 comments
Open

Runfiles API: obtaining the symlink path within runfiles_dir #24929

jwnimmer-tri opened this issue Dec 12, 2024 · 8 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts

Comments

@jwnimmer-tri
Copy link
Contributor

jwnimmer-tri commented Dec 12, 2024

Description of the problem / feature request:

This is a feature request for an API addition to the runfiles library. I would like to have a way to return a runfiles path that lives inside the $RUNFILES_DIR (which will be symlink to its actual home), rather than having the runfiles library return the symlink's destination.

Chasing the symlink inside the Runfiles library loses information. Details below.

Feature requests: what underlying problem are you trying to solve with this feature?

Consider this demo program: https://github.com/jwnimmer-tri/repro/tree/bazel-runfiles-paths/demo

The program uses two runfiles: the mesh.obj which is part of the source tree, and the mesh.mtl which is the output by a genrule. As is conventional for obj files the text of the obj mentions a material library filename by saying mtllib mesh.mtl, and refers to a file in the same directory. For any software to load that file, the two files (obj and mtl) must be in the same directory.

However, if I run the program I see this output (bazel 8.0.0, rules_cc 0.0.17):

$ bazel run //demo:program
The goal is for all of these paths to be in the same directory:

mesh_path: /home/jwnimmer/jwnimmer-tri/repro/demo/mesh.obj
mtl_path: /mnt/nobackup/cache/bazel/_bazel_jwnimmer/d99b0c65cbdeed7837d9eb064003636b/execroot/_main/bazel-out/k8-fastbuild/bin/demo/mesh.mtl

RUNFILES_DIR=/mnt/nobackup/cache/bazel/_bazel_jwnimmer/d99b0c65cbdeed7837d9eb064003636b/execroot/_main/bazel-out/k8-fastbuild/bin/demo/program.runfiles and contains these files:
  MANIFEST
  _main
  _main/demo
  _main/demo/mesh.mtl
  _main/demo/mesh.obj
  _main/demo/program
  _repo_mapping

When the program calls Rlocation, the result is a path to the actual file -- for the obj we get the source tree and for the generated file we get the output path. That's fine in many cases, but when we have groups of inter-related related files and some of them are source files others are build outputs, we often need to have them in a single logical directory even if stored elsewhere. This is a requirement of many file formats that weave together multiple files into a single entity, to have their sub-asset files in the same directory as the main file.

It turns out we already have a reasonable directory with logical filenames no matter where they physically came from -- the files under RUNFILES_DIR are laid out logically, as seen in the RUNFILES_DIR walk above.

I would like to be able to call Rlocation and get paths like this:

  • /mnt/nobackup/cache/bazel/_bazel_jwnimmer/d99b0c65cbdeed7837d9eb064003636b/execroot/_main/bazel-out/k8-fastbuild/bin/demo/program.runfiles/_main/demo/mesh.obj
  • /mnt/nobackup/cache/bazel/_bazel_jwnimmer/d99b0c65cbdeed7837d9eb064003636b/execroot/_main/bazel-out/k8-fastbuild/bin/demo/program.runfiles/_main/demo/mesh.mtl

Proposed solution

If the class Runfiles object had some kind of flag or option to opt-in to returning RUNFILES_DIR relative paths, instead of the manifest paths, that would solve the problem. It could either be a constructor argument, or an option argument to Rlocation.

Work-arounds

Prior to bzlmod, I could scrape the EnvVars for RUNFILES_DIR and then tack on the workspace name and resource path afterward, to find the shape of path I need. Now with bzlmod, we have the _repo_mapping rewriting happening, which is not realistically possible to re-implement myself. My current work-around is to #define private public and then clear() the runfiles_map_ private member, in which case the Runfiles object always returns relative paths to the runfiles dir.

What operating system are you running Bazel on?

Ubuntu 22.04

What's the output of bazel info release?

release 8.0.0

What version of rules_cc do you use? Can you paste the workspace rule used to fetch rules_cc? What other relevant dependencies does your project have?

See example link above for the MODULE.bazel of the reproducer. Only rules_cc == 0.0.17.

What Bazel options do you use to trigger the issue? What C++ toolchain do you use?

See example link above for reproducer. Local toolchain.

Have you found anything relevant by searching the web?

The https://groups.google.com/g/bazel-discuss/c/DsVivJhU7Bw discussion is loosely relevant.

Any other information, logs, or outputs that you want to share?

N/A

@jwnimmer-tri
Copy link
Contributor Author

If the team has guidance on what kind of API would be suitable, I'm happy to write the PR and tests. Also if you have guidance on whether/how to coordinate with the runfiles libraries for other languages (java, python, etc.) let me know. For my part I only need rules_cc but could also push the rules_python.

@jwnimmer-tri
Copy link
Contributor Author

It looks like rules_python already has an API to create a Runfiles object that is always directory-based:

https://github.com/bazelbuild/rules_python/blob/66a8b5b595710bd107c31ad5d449593536effb76/python/runfiles/runfiles.py#L390-L391

That would also be a solution here. I'll work on a pull request to copy that same API into C++.

@fmeum
Copy link
Collaborator

fmeum commented Dec 22, 2024

It's worth keeping in mind that there can be situations in which the runfiles directory isn't available or isn't up-to-date (e.g. on Windows or on Unix with --noenable_runfiles). Forcing its use may thus end up resulting in incorrect and potentially non-hermetic behavior. But of course this is totally fine if your code isn't used as a dependency by other Bazel projects.

If a precise file layout in runfiles is important, I would recommend adding a rule that collects all relevant files in a declared output directory (ctx.declare_directory) in the relevant layout. You can then look up one runfile and expect all the other to be in the correct relative location on all OSes.

@pzembrod
Copy link
Contributor

You raise a good point, @fmeum. Esp. since runfiles is a relatively central concept to Bazel, I'm somewhat reluctant to enhances its API with something that doesn't work on all platforms.
@comius @trybka do you have thoughts on this?

@trybka
Copy link
Contributor

trybka commented Jan 14, 2025

Just to ensure I understand: the request here is for an API to say, "get me the logical, runfiles-relative path of this artifact" rather than the current behavior which effectively is like realpath?

@jwnimmer-tri
Copy link
Contributor Author

... request here is for an API to say ...

Yes, a runfiles-relative path would be okay. Also okay would be an absolute path that was the RUNFILES_DIR plus the relative path. The invariant I'm looking for is that all runfiles from the same package differ only in the filename part the returned path, no matter if they are source files or generated files. It doesn't so matter so much what the directory part of the path looks like (relative or absolute), only that it's consistent within a package.

Windows or on Unix with --noenable_runfiles

My project doesn't support Windows at all, and also doesn't support running the programs which need this feature under --noenable_runfiles. So, I could live with that compromise (until I port to declare_directory?).

... potentially non-hermetic behavior ...

Hmm. I was aware that the runfiles dir could be absent. I was not aware that $RUNFILES_DIR could be set but would point to stale data. It seems to me like Bazel shouldn't provide that variable if it's filled with lies?

... I would recommend adding a rule that collects all relevant files in a declared output directory (ctx.declare_directory) in the relevant layout ..

Thanks, I will explore that, but it doesn't seem like an easy rewrite. In my case, there are many files within a package but any given program declares data = ... for a small subset of them (and widening the set of data to be all files in the package would cause too many spurious rebuilds / retests), but a non-trivial amount of the files refer to each other so could not be easily carved up into declare_directory molecules without a lot of rework to accommodate the large number of new sub-directories that would be required to cover all cuts.

@fmeum
Copy link
Collaborator

fmeum commented Jan 15, 2025

Hmm. I was aware that the runfiles dir could be absent. I was not aware that $RUNFILES_DIR could be set but would point to stale data. It seems to me like Bazel shouldn't provide that variable if it's filled with lies?

The top-level process won't have any variables set and needs to decide whether it should use the manifest or the directory. The directory always exists, but may be empty (except for the workspace directory). That's why the manifest needs to be preferred if available.

I am planning to simplify the runfiles discovery procedure so that a runfiles library can reliably detect whether the runfiles directory is usable. When that has happened, the library can prefer the directory over the manifest and your use case should just work, without additional API.

@comius comius transferred this issue from bazelbuild/rules_cc Jan 15, 2025
@comius comius added the team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts label Jan 15, 2025
@jwnimmer-tri
Copy link
Contributor Author

The top-level process won't have any variables set ...

Right, thanks. I confused $TEST_SRCDIR with the non-test case.

When that has happened, ...

Makes sense, thanks.

@comius comius added the P3 We're not considering working on this, but happy to review a PR. (No assignee) label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts
Projects
None yet
Development

No branches or pull requests

5 participants