HTTP cache size limit environment variables #11530

GregoryTravis · 2024-11-11T20:58:59Z

Also simplified testing methods. Added LRUCacheSettings to hold settings, and mocks for free disk space and current time.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
Unit tests have been written where possible.
If meaningful changes were made to logic or tests affecting Enso Cloud integration in the libraries,
or the Snowflake database integration, a run of the Extra Tests has been scheduled.
- If applicable, it is suggested to paste a link to a successful run of the Extra Tests.

radeusgd · 2024-11-12T11:36:23Z

std-bits/base/src/main/java/org/enso/base/cache/DiskSpaceGetter.java

+  public DiskSpaceGetter() {
+    super(() -> getRootPath().getUsableSpace());
+  }


Feel free to ignore as it's just personal style preference, but IMHO here it would be more readable if Mockable just had a protected abstract computeValue(); that would be overridden in a sub-class, instead of passing lambdas around.

radeusgd · 2024-11-12T15:54:34Z

distribution/lib/Standard/Base/0.0.0-dev/src/Meta/Enso_Project.enso

+    ## PRIVATE
+       Returns the path of the project root.
+    root_path : Text
+    root_path self = self.root.path


It feels a bit redundant, since we can just call invokeMethod twice on Java side, right?

I did this because invokeMember didn't work here; I get:

Exception occurred in target VM: Unsupported operation Value.invoke(path, Object...) for '(File /Users/gmt/dev/enso/enso/test/Table_Tests)'(language: Java, type: org.graalvm.polyglot.Value). You can ensure that the operation is supported using Value.canInvoke(String).

Perhaps because it's a builtin Enso type?

root.hasMember("path") false

Oh, I see. Well in such case such workaround seems justified.

radeusgd · 2024-11-12T15:56:26Z

std-bits/base/src/main/java/org/enso/base/cache/LRUCache.java

@@ -28,22 +28,46 @@
 * deleting entries to make space for new ones. All cache files are set to be deleted automatically
 * on JVM exit.
 *
+ * <p>Limits should be set with environment variables: ENSO_LIB_HTTP_CACHE_MAX_FILE_SIZE_MEGS --
+ * single file size, in megs ENSO_LIB_HTTP_CACHE_MAX_TOTAL_CACHE_LIMIT -- total cache size, in megs


Can we use full names instead of abbreviations in the docs? Or at least let's use the standard MB abbreviation. Megs sounds very casual and may not be understandable to everyone IMHO.

radeusgd · 2024-11-12T16:00:31Z

std-bits/base/src/main/java/org/enso/base/cache/LRUCache.java

+  /**
+   * An upper limit on the total cache size. If the cache size limit specified by the other
+   * parameters goes over this value, then this value is used.
+   */
+  private static final long MAX_TOTAL_CACHE_SIZE_FREE_SPACE_UPPER_BOUND = 100L * 1024 * 1024 * 1024;


I don't understand the FREE_SPACE part of this name

radeusgd · 2024-11-12T16:13:14Z

std-bits/base/src/main/java/org/enso/base/cache/LRUCache.java

+  public long getMaxTotalCacheSize(long currentlyUsed) {
+    var totalCacheSize =
+        switch (settings.getTotalCacheLimit()) {
+          case TotalCacheLimit.Bytes bytes -> bytes.bytes();
+          case TotalCacheLimit.Percentage percentage -> {
+            long usableSpace = diskSpaceGetter.get() + currentlyUsed;
+            yield (long) (percentage.percentage() * usableSpace);
+          }
+        };
+    return Long.min(MAX_TOTAL_CACHE_SIZE_FREE_SPACE_UPPER_BOUND, totalCacheSize);


A few edge cases that I think need to be handled here:

in Bytes mode we completely skip checking the available disk space. What if our cache limit is set to 1000MB but the device has only 100MB of available space? This code currently returns 1000MB as the max total cache size. So the cache logic will happily perform writes for up to 1GB, exceeding the available 100MB. This will likely mean that the first cache write operation that happens when we run out of disk space will probably fail with some IOException: Out of storage space. But we have no logic to handle this. Running out of storage space for a cache should not mean that a HTTP request fails - from user's perspective it has nothing to do with disk storage, so the details of cache handling should be opaque to the user and not cause errors. But instead of catching this kind of error, we should be pro-actively ensuring that our cache writes do not exceed the available disk space.

moreover, I don't think it is good practice for a cache to take up the last bytes available on the disk. As that may cause other applications to stop working properly. This is maybe less important but it feels like to be a good citizen of the OS, an optional cache should not use up the last resources. Thus I was proposing to keep a margin that is never exceeded. E.g. if the available disk space is less than 100MB (or 1GB, up to us), then regardless of total cache size set, we probably should not take up these last few bytes.

(2) is optional, but (1) is I think really needed to make this cache usable - running out of disk space should not be a cause for failing HTTP requests. But since we need to ensure we don't make the cache larger than available disk space, we probably could additionally add a little margin of ~100MB to satisfy (2) as well.

When working on either 1 or 2, we need to take into account the ENSO_LIB_HTTP_CACHE_MAX_FILE_SIZE - as the cache can grow by at most this amount. So we need to ensure that before downloading any file to a cache location, we have at least this much free disk space.

Of course there's always the possibility of edge cases like other applications writing to disk at the same time as us and thus running out of space even if we had all the necessary margins. But IMHO such cases are probably ok to ignore for now. But keeping the margins is really rather easy so IMO it would be worth to do it. For the other cases - we may want to intercept the IOException and wrap it into some more Enso-friendly error, e.g. Illegal_State.Error telling the user that we ran out of disk space when trying to cache the file and that they may try to disable caches for this request or just re-run the node to try again.

Instead of a fixed margin, I capped the total allowed size at 90% (MAX_PERCENTAGE) of the free disk space, regardless of whether the limit was specified in MB or as a percentage. I think this provides the protection we need.

For the max file size -- previously, it was clearing enough space for the download if there was a content-length, but not otherwise. Now, if there is no content-length, it clears out space for the largest allowed file. This is likely to be too much, but this case is probably a lot less common.

As for running out of space anyway, because of other applications writing to disk (or for any other reason) -- there is already logic to handle that case. If it encounters an IOException makeRequestAndCache, it removes any partially-downloaded file, and re-issues the request without caching. I don't attach a warning in this case, because this is very likely to be transient. Since it's unlikely to happen repeatedly, I don't think there's any value in bothering the user with it.

Ah sorry, I did not notice the 90% cap. Okay, I think that is good indeed.

Looks great then.

radeusgd · 2024-11-12T16:19:49Z

std-bits/base/src/main/java/org/enso/base/cache/LRUCacheSettings.java

+    return (long) (maxFileSizeMegs * 1024 * 1024);
+  }
+
+  /** Uses the environment variable if set, otherwise uses a default percentage. */


The comment on this function is wrong, it does not do what it says.

Do we need this separate static function here, it feels like we can replace it with just a call to TotalCacheLimit.parse. IMO it adds an unnecessary layer of abstraction - essentially an alias that does not really have a purpose. It'd make sense if it could be overridden or had any other logic - but if all it does is delegate like an alias - I'd rather remove it.

radeusgd

Overall looks great, the new approach to setting parameters for testing is a great improvement.

IMO we should try to address the case of the cache size possibly exceeding all disk space.

Other than that, just some small comments.

radeusgd · 2024-11-12T19:56:42Z

std-bits/base/src/main/java/org/enso/base/cache/LRUCache.java

+    // If we have a content-length, clear up enough space for that not. If not,
+    // then clear up enough space for the largest allowed file size.


Some typo in the comment (not)

Suggested change

// If we have a content-length, clear up enough space for that not. If not,

// then clear up enough space for the largest allowed file size.

// If we have a content-length, clear up enough space for that. If not,

// then clear up enough space for the largest allowed file size.

radeusgd · 2024-11-12T20:00:43Z

std-bits/base/src/main/java/org/enso/base/cache/DiskSpaceGetter.java

@@ -4,8 +4,8 @@
 import org.enso.base.CurrentEnsoProject;

 public class DiskSpaceGetter extends Mockable<Long> {
-  public DiskSpaceGetter() {
-    super(() -> getRootPath().getUsableSpace());
+  public Long computeValue() {


I'd probably add @Override at the top.

radeusgd · 2024-11-12T20:09:11Z

std-bits/base/src/main/java/org/enso/base/enso_cloud/EnsoSecretHelper.java

@@ -22,6 +22,7 @@

 /** Makes HTTP requests with secrets in either header or query string. */
 public final class EnsoSecretHelper extends SecretValueResolver {
+  private static final EnsoHTTPResponseCache cache = new EnsoHTTPResponseCache();


I think we should create an instance the first time getCache is called.

Creating it statically has some problems.

We check environment variables when instantiating LRUCache inside of EnsoHTTPResponseCache constructor.

If I recall correctly, in Native Image builds the static initialization is done at image build time and thus the environment variables read during static initialization would be the variables from the build machine, not the machine on which then we are being run. So deferring the initialization may prepare us better for Native Image.

Additionally I saw throwing exceptions when the env variables are outside of ranges. Again, I don't think it's a good idea to do it inside of static initializer. Tbh I have no idea what will happen when we throw such an exception from a static initializer when we are running the GUI. I would have a guess that the Language Server will fail to boot in such a case. Which is not a good thing for users, especially as we are unlikely to have good reporting for such errors.

I'd move away to a 'lazy' initialization.

And overall I'm wondering if in this case we want throwing an exception. This will make the HTTP requests unusable due to misconfiguration. It feels like logging an error/warning and falling back to some defaults or clamping the values may be good enough - it will inform the user of the misconfiguration but will allow them to continue using the product.

Reverting to default instead of throwing.

GregoryTravis added 30 commits October 31, 2024 16:19

wip

d3eb7fa

Merge branch 'develop' into wip/gmt/11410-cache-env

3b3848c

make EHRC static

8064cfb

Merge branch 'develop' into wip/gmt/11410-cache-env

65d6141

EHRC singleton

12c3733

wip

8f7c0ab

wip

54cad98

Merge branch 'develop' into wip/gmt/11410-cache-env

eac9637

tests pass

c071438

use enso project root

232c88b

docs, rename

c744a58

docs, test for changing disk space

ad57bd7

doc

5742258

doc

859459a

prevent raising test disk space

875aa1f

upper bound test

36b3867

wip

e5dc3a5

wip

878d992

Merge branch 'develop' into wip/gmt/11410-cache-env

55379f4

Merge branch 'develop' into wip/gmt/11410-cache-env

6a896f0

wip

0d9da62

wip

d62ffa9

one passes

0321544

17

a1be3e6

double lambda

31a8f02

fix now

4a87a25

more

0e44c66

green

edc18cb

do not have to set both env vars

3b079f9

download not limited check checks that fetch throws

3d2fcf9

GregoryTravis added 2 commits November 11, 2024 15:54

Merge branch 'develop' into wip/gmt/11410-cache-env

e31c107

doc

f794059

GregoryTravis marked this pull request as ready for review November 11, 2024 20:59

GregoryTravis requested review from jdunkerley, radeusgd, AdRiley and marthasharkey as code owners November 11, 2024 20:59

GregoryTravis added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 11, 2024

fmt

476a04c

radeusgd reviewed Nov 12, 2024

View reviewed changes

GregoryTravis added 3 commits November 12, 2024 11:42

Merge branch 'develop' into wip/gmt/11410-cache-env

c1da246

90% test, move a test

e559a06

review

08be111

radeusgd reviewed Nov 12, 2024

View reviewed changes

review

115e7c7

radeusgd approved these changes Nov 12, 2024

View reviewed changes

radeusgd reviewed Nov 12, 2024

View reviewed changes

GregoryTravis added 4 commits November 12, 2024 15:12

review

6d679dc

create cache on first use

4e347e9

default instead of exception for bad env vars

c5ab512

fmt

d2d7b29

GregoryTravis merged commit fb50a8f into develop Nov 13, 2024
36 checks passed

GregoryTravis deleted the wip/gmt/11410-cache-env branch November 13, 2024 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP cache size limit environment variables #11530

HTTP cache size limit environment variables #11530

GregoryTravis commented Nov 11, 2024 •

edited

Loading

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024 •

edited

Loading

radeusgd Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd left a comment

radeusgd Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

radeusgd Nov 12, 2024

radeusgd Nov 12, 2024

GregoryTravis Nov 12, 2024

GregoryTravis Nov 12, 2024

		// If we have a content-length, clear up enough space for that not. If not,
		// then clear up enough space for the largest allowed file size.

HTTP cache size limit environment variables #11530

HTTP cache size limit environment variables #11530

Conversation

GregoryTravis commented Nov 11, 2024 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GregoryTravis Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GregoryTravis commented Nov 11, 2024 •

edited

Loading

GregoryTravis Nov 12, 2024 •

edited

Loading