-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO: persistent disk cache #204
Comments
If implemeted, this should be placed behind a big warning for performance issues compared to in-memory cache. |
With sendfile(), disk cache can be faster than in-memory cache. openssl has SSL_sendfile. For rustls, see https://github.com/rustls/ktls By all means document any trade-offs (eg slower warmup on reboot in exchange for faster cache serving), but benchmark it first. And then make sure nginx is configured with kTLS in bench too. |
Hey, I believe the "persistence" problem should be handled separately from the use of kTLS. Yes, rustls using the Kernel TLS would be faster than the vanilla rustls, and SSL_sendfile could be also a silver bullet for quick I/O operations. But but but! How should we manage the mapping of requested urls and exact files? If we consider the persistence of files, we may need to manage it using database or other data structures. Also, if persistent caches should be update dynamically (insertion of files), this problem would be more complex. Currently, I have taken an approach of hybrid caching with on-memory and disk caches, both of which are ephemeral (deleted once restarted). So the management of mapping is really simple and fast, just by using a kind of hash table directly managed by On the other hand, persistent caches with dynamic updates require an extra mapping management, externally from rpxy. So, from this point of view, I agree with @Gamerboy59 since it must involve the overhead of searching operations in the external table. (But this might be negligible in an appropriate database.) |
Caching can be crucial to some applications but if it's implemented in e.g. for a wordpress website: e.g. for other websites that don't require caching: This way you could deploy |
Excellent points.
For what it might be worth, both nginx and Apache use filesystem for this. Apache is particularly explicit about its implementation: ‘To store items in the cache, mod_cache_disk creates a 22 character hash of the URL being requested. This hash incorporates the hostname, protocol, port, path and any CGI arguments to the URL, as well as elements defined by the Vary header to ensure that multiple URLs do not collide with one another. Each character may be any one of 64-different characters, which mean that overall there are 64^22 possible hashes. For example, a URL might be hashed to xyTGxSMO2b68mBCykqkp1w. This hash is used as a prefix for the naming of the files specific to that URL within the cache, however first it is split up into directories as per the CacheDirLevels and CacheDirLength directives. CacheDirLevels specifies how many levels of subdirectory there should be, and CacheDirLength specifies how many characters should be in each directory. With the example settings given above, the hash would be turned into a filename prefix as /var/cache/apache/x/y/TGxSMO2b68mBCykqkp1w. The overall aim of this technique is to reduce the number of subdirectories or files that may be in a particular directory, as most file-systems slow down as this number increases. With setting of "1" for CacheDirLength there can at most be 64 subdirectories at any particular level. With a setting of 2 there can be 64 * 64 subdirectories, and so on. Unless you have a good reason not to, using a setting of "1" for CacheDirLength is recommended. Setting CacheDirLevels depends on how many files you anticipate to store in the cache. With the setting of "2" used in the above example, a grand total of 4096 subdirectories can ultimately be created. With 1 million files cached, this works out at roughly 245 cached URLs per directory.’ nginx copies this structure almost entirely.
Right, a cache lock is usually employed.
Is there any reason to prefer this setup over the simpler rpxy -> nginx? nginx is itself a caching proxy, and as a web server handles both dynamic and static content faster than Apache because of its event loop, which Apache’s event MPM can’t really match in terms of being non-blocking and consuming very little memory. |
A persistent disk cache appears to be one of the remaining gaps to be filled before rpxy reachs a baseline feature parity with traditional reverse proxies such as Nginx, Apache and Caddy. Since it’s already on your TODO list, I figure it might not be out of place to have a thread tracking this.
Would the cacache crate fit the bill here?
https://github.com/zkat/cacache-rs
The text was updated successfully, but these errors were encountered: