fix(limit-req): Make Redis path atomic via EVAL + use hash key with TTL#12605
fix(limit-req): Make Redis path atomic via EVAL + use hash key with TTL#12605falvaradorodriguez wants to merge 12 commits intoapache:masterfrom
Conversation
25d03d5 to
3530220
Compare
- Switch Redis storage to a single hash key (cluster compatibility) - Perform read/compute/write atomically with EVAL - Keep first-hit behavior (no cost on missing state) - Add EX-based TTL to avoid key buildup
3530220 to
d60f099
Compare
|
Hi @falvaradorodriguez, we need to add the test case for this fix. |
88ed813 to
74d8587
Compare
|
Hi @falvaradorodriguez, there are failed CI that need fixing. |
40a7f03 to
af4d670
Compare
af4d670 to
6399925
Compare
Hi @Baoyuantop! The failures are not related to the current changes. Tests of the modified plugin seems to be fine. Please, Can you review? Thanks |
apisix/plugins/limit-req/util.lua
Outdated
|
|
||
| if commit == 1 then | ||
| redis.call("HMSET", state_key, "excess", new_excess, "last", now) | ||
| redis.call("EXPIRE", state_key, 60) |
There was a problem hiding this comment.
Can TTL be dynamically configured based on the rate-limiting window period, or allow user configuration?
There was a problem hiding this comment.
In the limit-req plugin, there is no time window like in the limit-count plugin.
This configuration is simply done to prevent keys from remaining dead in redis. Currently, if a consumer stops making requests, their key remains in redis until it is deleted by an action independent of Apisix.
It could be made configurable, but in my opinion, it would not add much value to this plugin. It would also be an extra feature, not related with this fix.
In my opinion, since limit-req is always for 1 second, with a 1 minute of margin in redis, it is acceptable. If the consumer makes requests after 1 minute, the flow management would be the same as for the first request.
|
When will this PR be merged? I am having the same issue that this PR fixes 🙏 |
|
I will promptly urge other community maintainers to review. |
Precompute the Lua script SHA1 locally and always execute scripts via EVALSHA to avoid repeated SCRIPT LOAD operations. Add a robust NOSCRIPT fallback to EVAL to ensure compatibility with both resty.redis and resty.rediscluster, especially in Redis Cluster setups where scripts are cached per node. This improves performance and makes script execution resilient to Redis node restarts, failovers, and resharding.
There was a node management issue with Redis.cluster. It seems that loading the script does not guarantee that it will be available in the next sha1 evaluation. I have changed the fallback to eval to ensure that the script is executed. Once executed with eval, it can be checked with sha1. |
|
Hi @falvaradorodriguez, could you please fix the lint bug? |
|
|
||
| if commit == 1 then | ||
| redis.call("HMSET", state_key, "excess", new_excess, "last", now) | ||
| local ttl = math.ceil(burst / rate) + 1 |
apisix/plugins/limit-req/util.lua
Outdated
|
|
||
| local s | ||
| if type(err) == "table" then | ||
| s = tostring(err[1] or err.err or err.message or err.msg or err) |
There was a problem hiding this comment.
last time I checked, the type of err was always a string. Did you notice something different that you added err.err or err.message or err.msg or err?
There was a problem hiding this comment.
Thanks for the note. In resty.redis the error is indeed typically returned as a plain string.
However, since this code path is meant to work with both resty.redis and resty.rediscluster, I added a small defensive handling for the case where errors may be propagated as a structured table (e.g. containing err/message fields) instead of a raw string.
This doesn’t change the behavior for the common case, but makes the NOSCRIPT detection more robust across Redis standalone and Redis Cluster setups, especially during redirects/failovers where error formats may differ.
There was a problem hiding this comment.
With the changes you mentioned here, what I wrote above doesn't makes sense. I have replicated the way you handle the error you use in the limit-conn plugin.
apisix/plugins/limit-req/util.lua
Outdated
| else | ||
| excess = 0 | ||
| -- If the script isn't cached on this Redis node, fall back to EVAL. | ||
| if not res and is_noscript_error(err) then |
There was a problem hiding this comment.
I observed that reliability of redis evalsha with redis-cluster was very low.
#12872 (comment)
So I decided to not use evalsha when the policy is set to rediscluster. You can refer the PR I just linked to see how I did it.
Let me know if you have any better ideas tho.
There was a problem hiding this comment.
Makes sense, and it is also good practice to keep the logic of the plugins aligned. I have tried to replicate the functionality you mention here: 05657f6
shreemaan-abhishek
left a comment
There was a problem hiding this comment.
please resolve the conflicts too.
There was a problem hiding this comment.
Pull request overview
This PR fixes race conditions in the limit-req Redis/Redis-Cluster policies by moving the limiter state into a single Redis hash key and updating it atomically via a single Lua EVAL script (with EVALSHA fast-path for standalone Redis), aligning behavior with the intended rate-limit semantics under concurrency.
Changes:
- Replaces multi-key
GET/SETupdates with an atomic Redis Lua script operating on one hash key (excess+last) plus TTL. - Enables
EVALSHAoptimization for the standalone Redis policy; usesEVALfor Redis Cluster for reliability. - Adds test cases for Redis and Redis Cluster to validate hash structure usage, TTL presence, and basic limiting behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
apisix/plugins/limit-req/util.lua |
Implements atomic Redis script (hash-based state + TTL) with EVALSHA fallback logic. |
apisix/plugins/limit-req/limit-req-redis.lua |
Enables use_evalsha for standalone Redis limiter instances. |
apisix/plugins/limit-req/limit-req-redis-cluster.lua |
Disables use_evalsha for cluster limiter instances (uses EVAL). |
t/plugin/limit-req-redis.t |
Adds tests for hash-key state + TTL and a rapid-request rejection case for Redis policy. |
t/plugin/limit-req-redis-cluster.t |
Adds analogous tests for Redis Cluster policy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local rate = tonumber(ARGV[1]) -- req/s | ||
| local now = tonumber(ARGV[2]) -- ms | ||
| local burst = tonumber(ARGV[3]) -- req/s |
There was a problem hiding this comment.
The Lua script comments label rate/burst as "req/s", but in this limiter implementation they are passed around in the same scaled units used by the original algorithm (rate = configured_rate * 1000, burst = configured_burst * 1000). Updating these comments to reflect the actual units would reduce the risk of future logic changes introducing subtle math/TTL bugs.
| local rate = tonumber(ARGV[1]) -- req/s | |
| local now = tonumber(ARGV[2]) -- ms | |
| local burst = tonumber(ARGV[3]) -- req/s | |
| local rate = tonumber(ARGV[1]) -- scaled request rate (configured_rate * 1000) | |
| local now = tonumber(ARGV[2]) -- ms | |
| local burst = tonumber(ARGV[3]) -- scaled burst (configured_burst * 1000) |
| local vals = red:hmget("limit_req:{test_key}:state", "excess", "last") | ||
| if vals[1] and vals[2] then |
There was a problem hiding this comment.
resty.redis:hmget returns (vals, err). This test only captures the first return value and then unconditionally indexes vals[1]/vals[2], which will throw a Lua runtime error if hmget fails and returns nil. Capture err and guard for not vals (and ideally print the error) to make the test robust and failures diagnosable.
| local vals = red:hmget("limit_req:{test_key}:state", "excess", "last") | |
| if vals[1] and vals[2] then | |
| local vals, err = red:hmget("limit_req:{test_key}:state", "excess", "last") | |
| if not vals then | |
| ngx.say("failed to hmget: ", err) | |
| elseif vals[1] and vals[2] then |
moonming
left a comment
There was a problem hiding this comment.
Hi @falvaradorodriguez, thank you for making the Redis limit-req path atomic via EVAL!
Using a Lua script with EVAL to ensure atomicity of the rate limiting operation is the correct approach — the current non-atomic multi-command flow can have race conditions under high concurrency. With 12 reviews, this has been thoroughly discussed.
To confirm readiness:
- Are all 12 review comments addressed?
- Has the EVAL script been tested under concurrent load to verify it resolves the race condition?
- The hash key with TTL approach — does it handle key expiration correctly for sliding windows?
This is an important correctness fix. Let's get it finalized! Thank you.
|
Hi @falvaradorodriguez, I see no problems with the code here. Could you please fix the failing test? |
Description
The current
limit-reqRedis implementation uses two separate keys (excess and last) and updates them with multiple GET/SET operations.Solution
Store both values (excess and last) under a single Redis hash key, so the state is managed as one unit.
Use a single EVAL script that performs read → compute → write atomically inside Redis, removing race conditions. This approach is consistent with how the
limit-countplugin already works.Add a TTL to the key to avoid buildup of stale state.
Preserve existing semantics: the first request with no prior state does not consume tokens.
Which issue(s) this PR fixes:
Fixes #12592
Checklist