-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP cache size limit environment variables #11530
Conversation
public DiskSpaceGetter() { | ||
super(() -> getRootPath().getUsableSpace()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore as it's just personal style preference, but IMHO here it would be more readable if Mockable
just had a protected abstract computeValue();
that would be overridden in a sub-class, instead of passing lambdas around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
## PRIVATE | ||
Returns the path of the project root. | ||
root_path : Text | ||
root_path self = self.root.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit redundant, since we can just call invokeMethod
twice on Java side, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because invokeMember
didn't work here; I get:
Exception occurred in target VM: Unsupported operation Value.invoke(path, Object...) for '(File /Users/gmt/dev/enso/enso/test/Table_Tests)'(language: Java, type: org.graalvm.polyglot.Value). You can ensure that the operation is supported using Value.canInvoke(String).
Perhaps because it's a builtin Enso type?
root.hasMember("path")
false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Well in such case such workaround seems justified.
@@ -28,22 +28,46 @@ | |||
* deleting entries to make space for new ones. All cache files are set to be deleted automatically | |||
* on JVM exit. | |||
* | |||
* <p>Limits should be set with environment variables: ENSO_LIB_HTTP_CACHE_MAX_FILE_SIZE_MEGS -- | |||
* single file size, in megs ENSO_LIB_HTTP_CACHE_MAX_TOTAL_CACHE_LIMIT -- total cache size, in megs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use full names instead of abbreviations in the docs? Or at least let's use the standard MB
abbreviation. Megs
sounds very casual and may not be understandable to everyone IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* An upper limit on the total cache size. If the cache size limit specified by the other | ||
* parameters goes over this value, then this value is used. | ||
*/ | ||
private static final long MAX_TOTAL_CACHE_SIZE_FREE_SPACE_UPPER_BOUND = 100L * 1024 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the FREE_SPACE
part of this name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
public long getMaxTotalCacheSize(long currentlyUsed) { | ||
var totalCacheSize = | ||
switch (settings.getTotalCacheLimit()) { | ||
case TotalCacheLimit.Bytes bytes -> bytes.bytes(); | ||
case TotalCacheLimit.Percentage percentage -> { | ||
long usableSpace = diskSpaceGetter.get() + currentlyUsed; | ||
yield (long) (percentage.percentage() * usableSpace); | ||
} | ||
}; | ||
return Long.min(MAX_TOTAL_CACHE_SIZE_FREE_SPACE_UPPER_BOUND, totalCacheSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few edge cases that I think need to be handled here:
- in Bytes mode we completely skip checking the available disk space. What if our cache limit is set to 1000MB but the device has only 100MB of available space? This code currently returns 1000MB as the max total cache size. So the cache logic will happily perform writes for up to 1GB, exceeding the available 100MB. This will likely mean that the first cache write operation that happens when we run out of disk space will probably fail with some
IOException: Out of storage space
. But we have no logic to handle this. Running out of storage space for a cache should not mean that a HTTP request fails - from user's perspective it has nothing to do with disk storage, so the details of cache handling should be opaque to the user and not cause errors. But instead of catching this kind of error, we should be pro-actively ensuring that our cache writes do not exceed the available disk space. - moreover, I don't think it is good practice for a cache to take up the last bytes available on the disk. As that may cause other applications to stop working properly. This is maybe less important but it feels like to be a good citizen of the OS, an optional cache should not use up the last resources. Thus I was proposing to keep a margin that is never exceeded. E.g. if the available disk space is less than 100MB (or 1GB, up to us), then regardless of total cache size set, we probably should not take up these last few bytes.
(2) is optional, but (1) is I think really needed to make this cache usable - running out of disk space should not be a cause for failing HTTP requests. But since we need to ensure we don't make the cache larger than available disk space, we probably could additionally add a little margin of ~100MB to satisfy (2) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When working on either 1 or 2, we need to take into account the ENSO_LIB_HTTP_CACHE_MAX_FILE_SIZE
- as the cache can grow by at most this amount. So we need to ensure that before downloading any file to a cache location, we have at least this much free disk space.
Of course there's always the possibility of edge cases like other applications writing to disk at the same time as us and thus running out of space even if we had all the necessary margins. But IMHO such cases are probably ok to ignore for now. But keeping the margins is really rather easy so IMO it would be worth to do it. For the other cases - we may want to intercept the IOException
and wrap it into some more Enso-friendly error, e.g. Illegal_State.Error
telling the user that we ran out of disk space when trying to cache the file and that they may try to disable caches for this request or just re-run the node to try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a fixed margin, I capped the total allowed size at 90% (MAX_PERCENTAGE
) of the free disk space, regardless of whether the limit was specified in MB or as a percentage. I think this provides the protection we need.
For the max file size -- previously, it was clearing enough space for the download if there was a content-length, but not otherwise. Now, if there is no content-length, it clears out space for the largest allowed file. This is likely to be too much, but this case is probably a lot less common.
As for running out of space anyway, because of other applications writing to disk (or for any other reason) -- there is already logic to handle that case. If it encounters an IOException
makeRequestAndCache
, it removes any partially-downloaded file, and re-issues the request without caching. I don't attach a warning in this case, because this is very likely to be transient. Since it's unlikely to happen repeatedly, I don't think there's any value in bothering the user with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I did not notice the 90% cap. Okay, I think that is good indeed.
Looks great then.
return (long) (maxFileSizeMegs * 1024 * 1024); | ||
} | ||
|
||
/** Uses the environment variable if set, otherwise uses a default percentage. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on this function is wrong, it does not do what it says.
Do we need this separate static function here, it feels like we can replace it with just a call to TotalCacheLimit.parse
. IMO it adds an unnecessary layer of abstraction - essentially an alias that does not really have a purpose. It'd make sense if it could be overridden or had any other logic - but if all it does is delegate like an alias - I'd rather remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, the new approach to setting parameters for testing is a great improvement.
IMO we should try to address the case of the cache size possibly exceeding all disk space.
Other than that, just some small comments.
// If we have a content-length, clear up enough space for that not. If not, | ||
// then clear up enough space for the largest allowed file size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some typo in the comment (not
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If we have a content-length, clear up enough space for that not. If not, | |
// then clear up enough space for the largest allowed file size. | |
// If we have a content-length, clear up enough space for that. If not, | |
// then clear up enough space for the largest allowed file size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -4,8 +4,8 @@ | |||
import org.enso.base.CurrentEnsoProject; | |||
|
|||
public class DiskSpaceGetter extends Mockable<Long> { | |||
public DiskSpaceGetter() { | |||
super(() -> getRootPath().getUsableSpace()); | |||
public Long computeValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably add @Override
at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -22,6 +22,7 @@ | |||
|
|||
/** Makes HTTP requests with secrets in either header or query string. */ | |||
public final class EnsoSecretHelper extends SecretValueResolver { | |||
private static final EnsoHTTPResponseCache cache = new EnsoHTTPResponseCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create an instance the first time getCache
is called.
Creating it statically has some problems.
We check environment variables when instantiating LRUCache
inside of EnsoHTTPResponseCache
constructor.
If I recall correctly, in Native Image builds the static initialization is done at image build time and thus the environment variables read during static initialization would be the variables from the build machine, not the machine on which then we are being run. So deferring the initialization may prepare us better for Native Image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally I saw throwing exceptions when the env variables are outside of ranges. Again, I don't think it's a good idea to do it inside of static initializer. Tbh I have no idea what will happen when we throw such an exception from a static initializer when we are running the GUI. I would have a guess that the Language Server will fail to boot in such a case. Which is not a good thing for users, especially as we are unlikely to have good reporting for such errors.
I'd move away to a 'lazy' initialization.
And overall I'm wondering if in this case we want throwing an exception. This will make the HTTP requests unusable due to misconfiguration. It feels like logging an error/warning and falling back to some defaults or clamping the values may be good enough - it will inform the user of the misconfiguration but will allow them to continue using the product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting to default instead of throwing.
Also simplified testing methods. Added
LRUCacheSettings
to hold settings, and mocks for free disk space and current time.Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
or the Snowflake database integration, a run of the Extra Tests has been scheduled.