Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New audit: cache poisoning #261

Closed
woodruffw opened this issue Dec 9, 2024 · 8 comments
Closed

New audit: cache poisoning #261

woodruffw opened this issue Dec 9, 2024 · 8 comments
Assignees
Labels
new-audit New audits
Milestone

Comments

@woodruffw
Copy link
Owner

Cache poisoning is difficult/impossible for us to detect in the general case, since it's a runtime aspect.

However, we can flag certain patterns that we know are likely to result in exploitable cache poisoning, e.g.:

  1. The workflow has a release/tag trigger, indicating that it's a release process
  2. The workflow has one or more steps that include caching, such as setup-python with the cache: key or setup-node with similar. There are probably many of these, and we need to figure them out/enumerate them.
@woodruffw woodruffw added the new-audit New audits label Dec 9, 2024
@woodruffw woodruffw mentioned this issue Dec 9, 2024
28 tasks
@ubiratansoares
Copy link
Contributor

@woodruffw If you don't have plans to take this one, I'd love to give it a try 🙂

@woodruffw
Copy link
Owner Author

woodruffw commented Dec 11, 2024

@woodruffw If you don't have plans to take this one, I'd love to give it a try 🙂

Go for it! This one has some interesting edge cases (e.g. there might be a top-level release trigger but then an if: that limits it on a job with a cache), but we can figure those out as you iterate 🙂

Off the top of my head, here's an initial set of actions we want to catch here:

  • actions/setup-python: has cache:
  • actions/setup-node: has cache:
  • ruby/setup-ruby: has bundler-cache:
  • astral-sh/setup-uv has enable-cache:
  • actions-rust-lang/setup-rust-toolchain: has cache: plus some others: https://github.com/actions-rust-lang/setup-rust-toolchain#inputs
    • cache appears to be on by default for this one!
  • any use of Swatinem/rust-cache at all (in a release workflow)?
  • any use of actions/cache or actions/cache/restore at all (in a release workflow)?

@woodruffw woodruffw added this to the 1.0 milestone Dec 11, 2024
@ubiratansoares
Copy link
Contributor

Thanks for sharing your thoughts. Those considerations definitely help 🚀

I'm particular interested having some upfront chat around the set of Actions we want to check.

My first take would be having the Actions provided by Github as the hardcoded ones to go for. These are not going anywhere (hopefully 😂), and they look like a good place to start.

Those Actions would be :

However, I really like the idea of dynamically retrieve a list of Actions we want to inspect for cacheability, using a schema that exposes at least

  • the GHA coordinate (e.g. astral-sh/setup-uv)
  • the input we want to evaluate (e.g. enable-cache) along with a hint on the default value (if any)

That would allow an user running a zizmor version a behind the latest to also benefit from new Actions being listed as suitable for a cache-poisoning attack. Perhaps we can maintain such a list as a .json file along with the website assets and deploy it over Github Pages, hence retrieving it easily without any Github PAT.

In addition, adding a new GHA to the list of ones we want to check is a matter of reviewing/merging a PR without any code changes (in general). We could even have a CI check to validate the .json we'll consume from the CLI. 😄

This approach could be a second iteration, since there is some design to consider for the data schema (e.g., should we track versions for those Actions as well?), or we can for it right away.

In the end of the, either having a static or dynamic list of suitable Actions to check, it should be straightforward add them, because the ecosystem is huge, and new additions surely will come! For example, as a long term JVM user, I'm aware that gradle/setup-gradle also caches and restores on top of actions/toolkit by default 🙈

I'd love to learn your thoughts on this 🙂

  • Should we be pedantic for this audit ? If yes, only for persona=auditor?
  • Do we want to this check to be 100% offline for all Actions we support?
  • On the other way around: do we want to be 100% online for all Actions we support, without any hardcoded coordinates?
  • Alternatively in between: do we want to have at least some verifications purely based on harcoded GHA coordinates?

@woodruffw
Copy link
Owner Author

Re: data files: you read my mind! This is something I've been thinking about more generally, as a way to improve zizmor's online audit performance (since repeated paginated calls to GH's APIs are not fast) and avoid having to release new versions to release data changes (like you said).

The big challenge there IMO is making sure that things gracefully degrade in an offline setting. In practice, I think that means:

  1. Each zizmor version should ship some version of the data it needs
  2. If the user is online, we should opportunistically try to fetch a newer copy, but fail gracefully (with a warning) if the fetch fails
  3. If the local version of the data gets "too" stale (>48h?), we should still use it but with a warning encouraging the user to go online to get a newer copy

TL;DR: I think this is a great idea, and that we should do it. However, I think it's a separate set of tasks than this audit itself, so I think I'd like to start things here with just a hard-coded list, and then we can do follow-ups for designing a static API.

Should we be pedantic for this audit ? If yes, only for persona=auditor?

IMO this should be non-pedantic, since it'll be pretty filtered by default (there should be a pretty low noise floor on "has caching enabled and is a release workflow).

Do we want to this check to be 100% offline for all Actions we support?

100% offline, yep 🙂

Alternatively in between: do we want to have at least some verifications purely based on harcoded GHA coordinates?

Long term I think we'll want to collapse the distinction between the two: the audit itself will always be offline, but the age of the data it uses will depend on whether the user is online or offline 🙂

@woodruffw
Copy link
Owner Author

I've opened #278 as a separate tracker for designing a static API.

@ubiratansoares
Copy link
Contributor

Thanks @woodruffw ! I fully agree with having a separated task for that, meanwhile let's get started with the offline take 🙂

@ubiratansoares
Copy link
Contributor

@woodruffw Just raised a first draft!

@woodruffw
Copy link
Owner Author

#294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-audit New audits
Projects
None yet
Development

No branches or pull requests

2 participants