Skip to content

Conversation

@timvisher-dd
Copy link

@timvisher-dd timvisher-dd commented Feb 7, 2026

Closes 99designs#1275 (abandoned)

I run aws-vault as a credentials_provider for SSO backed AWS profiles. This is often in parallel contexts of dozens to hundreds of invocations at once.

In that context when credentials needed to be refreshed, aws-vault would open an unconstrained amount of browser tabs in parallel, usually triggering HTTP 500 responses on the AWS side and failing to acquire creds.

To mitigate this I developed a small wrapper around aws-vault that would use a Bash dir lock (Wooledge BashFAQ 45) when there was a possibility that the credentials would need to be refreshed. This worked but it was also quite slow as it would lock the entire aws-vault execution rather than just the step that might launch a browser tab. The dir locking strategy was also sensitive to being killed by process managers like terraform and so had to do things like die voluntarily after magic amounts of seconds to avoid being SIGKILLed and leaving a stale lock around.

This changeset introduces a cross-platform process lock that is tolerant of all forms of process death (even SIGKILL), and wires it into the SSO/OIDC browser flow so only one process refreshes at a time. This keeps parallel invocations fast while avoiding the browser storm that triggers HTTP 500s.

While stress testing I also found that the session cache could race across processes, leading to "item already exists" errors from the keychain. This branch adds a session cache refresh lock so only one process writes back to the same cache entry at once, while others wait and re-check.

Because the parallelism is now safe and fast, I also hit SSO OIDC rate limits (429). This change adds Retry-After support with jittered backoff so the SSO token exchange is resilient under heavy load.

In a stress test across 646 AWS profiles in 9 SSO Directories I'm able to retrieve creds successfully in ~36 seconds on my box. This fails on upstream HEAD because the browser storm overwhelms the IAM/SSO APIs and the keychain cache update races.

Testing

Unit tests have been added for the SSO/OIDC lock, session cache lock, and OIDC retry logic. I also used some local integration test scripts that clear out the creds and run aws-vault export --format=json across different sets of my profiles and assert that it succeeds. Finally I've converted my local tooling to use this fork of aws-vault and have been exercising it there without issue.

Colofon

I did not write any of this code. Codex did. That said I have read through it in some detail and it looks reasonable to me.

Co-authored-by: Codex noreply@openai.com

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Feb 7, 2026
@timvisher-dd timvisher-dd changed the title Allow aws-vault to safely be run in parallel feat: Allow aws-vault to safely be run in parallel Feb 7, 2026
@github-actions github-actions bot added the feat label Feb 7, 2026
@timvisher-dd timvisher-dd marked this pull request as ready for review February 7, 2026 19:31
@timvisher-dd timvisher-dd requested a review from mbevc1 as a code owner February 7, 2026 19:31
@timvisher-dd
Copy link
Author

Heads up I've seen some issues since running with this where in rare cases aws-vault will error with something along the lines of 'can't update keychain item it already exists'. I'm going to drop this back to draft and let it bake a bit more.

@timvisher-dd timvisher-dd marked this pull request as draft February 9, 2026 14:41
@mbevc1
Copy link

mbevc1 commented Feb 9, 2026

Hey @timvisher-dd , no worries let me know when it's ready for a review.

@timvisher-dd timvisher-dd marked this pull request as ready for review February 10, 2026 17:14
@timvisher-dd
Copy link
Author

I haven't been able to reproduce any issues yet. I think it may have been a failure in my small build script that creates a throwaway keychain to start from scratch.

@mbevc1
Copy link

mbevc1 commented Feb 10, 2026

I haven't been able to reproduce any issues yet. I think it may have been a failure in my small build script that creates a throwaway keychain to start from scratch.

No worries, we can mark this as Draft while you investigate, if you want to look more into it.

@timvisher-dd
Copy link
Author

Ah hah! Found it. It's a legit bug where if more than one aws-vault instance is trying to refresh creds for a single profile they fight over setting the keychain item. I can figure that out. Dropping this back to Draft.

@timvisher-dd timvisher-dd marked this pull request as draft February 10, 2026 18:48
timvisher-dd and others added 4 commits February 10, 2026 15:55
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@timvisher-dd timvisher-dd marked this pull request as ready for review February 10, 2026 21:24
@timvisher-dd timvisher-dd marked this pull request as draft February 11, 2026 13:13
@timvisher-dd
Copy link
Author

I think I've found another area where parallelism needs to be coordinated: The keychain Get/Set paths. I've never noticed this before because I use the login keychain which is always unlocked when I'm logged in so my test environment (which I think is closer to aws-vault recommendations) needs to unlock the keychain before it can be used and that's causing a storm of alerts asking to use the keychain.

@mbevc1
Copy link

mbevc1 commented Feb 11, 2026

I think I've found another area where parallelism needs to be coordinated: The keychain Get/Set paths. I've never noticed this before because I use the login keychain which is always unlocked when I'm logged in so my test environment (which I think is closer to aws-vault recommendations) needs to unlock the keychain before it can be used and that's causing a storm of alerts asking to use the keychain.

Could you talk more about your use case here and what is the problem you're trying to solve? Am I reading this correct you're getting locking issues on the keychain backend ro something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file feat

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants