-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffeine (Async) Cache seems to ignore 'maximum size' on a certain scenario #43382
Comments
/cc @gwenneg (cache) |
Any chance you can put together a sample that reproduces the problem? If not, more details on how you use Quarkus' caching would be great. Thanks |
Yes, we would at least need to know the cache implementation you are using ( Also, if it's always the same caches causing problems might be useful information. |
I had a look at our cache code and I don't see us doing obvious crazy things, we are mostly relying on Caffeine's behavior. Could you by any chance check a few things when the problem happens:
Anyway, if we can't get a reproducer, we will need to try to diagnose more precisely what's going on and if it's Quarkus misconfiguring Caffeine or a Caffeine issue. Probably also a good idea to check that |
Its probably also good to capture a thread dump (jstack) or application profile (jfr). It sounds like the futures never complete so they pile up over time, which implies the executor is starved due to blocking calls. The default is |
See my comment and @ben-manes 's comment above. We will need a lot more information to be able to track what's going on. |
@israelstmz any updates based on the comments above? |
Closing for lack of feeeback |
Describe the bug
We have multiple local caffeine caches in our web service. Per web request our application performs 6 external API calls. We use 6 caches to cache the web client responses (Uni).
Usage:
@CacheResult(cacheName = "helper-service", lockTimeout = 20)
Config:
(we have 6 of those)
Out of dozens of app instances - we noticed a few of them crashing with OOM. Analysis led us to the growing cache. In the cache metrics (promethes ) we can see that some caches are exceeding the 50K limit, keep rising way above 200K:
This is the metrics of the cache sizes:
As can be seen - only one is greatly increasing. Its not a specific one, each time a different one.
Over time this increases the old-generation heap space until GC becomes inefficient causing an OOM:
The nature of our traffic per app is ~30 requests per sec with occasional spikes - doubling this amount.
We even added a quarkus scheduler - manually invalidating all the caches once in an hour:
But this too seem to only clean a few of the caches some of the times but still occasionally - a certain cache piles up and crashes the memory.
We are using native.
Expected behavior
We expect the cache size to stay within/around the 'maximum-size' limit consistently.
Actual behavior
On some random (?) app instances, occasional cache is growing infinitely until an OOM occures.
How to Reproduce?
Unfortunately we weren't able to reproduce this although we tried. It might be related to the high throughput/spikes in our production environment. As stated, in our production, too, it only happens on a few pods out of dozens...
Output of
uname -a
orver
Linux 5.10.215-203.850.amzn2.x86_64 x86_64
Output of
java -version
quarkus-micro-image:2.0
Quarkus version or git rev
3.14.4
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: