-
Is there any way to make cached data persistent? I feel like that it would be nice to have some tools to inspect which and where data are cached. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@Beforerr, the cached data is actually persistent in (stored on disk) but for each data request Speasy checks if there is any update on the remote server providing these data. The main issue here is that this mechanism can be tricky to get right. The current implementation can discard cache entries too eagerly because it is safer to refresh data that keeping an outdated product by default. The pitfall here is that cache entries are fixed size (12h) data fragments that are stored when last server request was done. At any moment your cache is likely to have a mix of up to date and outdated entries for a given product. It means that using Manipulating the cache directlyThis not documented in user documentation but the API is there. You actually can play a bit with the cache manually in python, you can list entries with this method: import speasy as spz
entries = spz.core.cache.entries()
# drop internal entries
entries = list(filter(lambda e: '__internal__' not in e, entries))
print(entries[:100]) it should print something like this: It lists the keys that you can use to actually retrieve data from cache, the names as you can see are either cdf files URLs (entries for the archive module) or a string with the following format entry = spz.core.cache.get_item('amda/wnd_swe_n/2003-10-05T12:00:00+00:00')
v = spz.SpeasyVariable.from_dictionary(entry.data) Let say you want to merge several cache entries: entries = [ spz.core.cache.get_item('amda/wnd_swe_n/2003-10-05T00:00:00+00:00'), spz.core.cache.get_item('amda/wnd_swe_n/2003-10-05T12:00:00+00:00') ]
v = spz.products.variable.merge( [ spz.SpeasyVariable.from_dictionary(e.data) for e in entries ] ) Accessing the files directlySince we use DiskCache. DiskCache uses both sqlite and pickle to store data, some data are stored in files using hashes for path and filenames and small values are directly stored in the database. This makes almost impossible to reason about files here. |
Beta Was this translation helpful? Give feedback.
-
Thank you @jeandet for your explanation of the cache mechanism. The API is quite useful for inspecting some data. |
Beta Was this translation helpful? Give feedback.
@Beforerr, the cached data is actually persistent in (stored on disk) but for each data request Speasy checks if there is any update on the remote server providing these data. The main issue here is that this mechanism can be tricky to get right. The current implementation can discard cache entries too eagerly because it is safer to refresh data that keeping an outdated product by default.
In my TODO list, I planed to add a
no_refresh
keyword to theget_data
function to take data from cache without checking if remote servers have a newer version of a given product.The pitfall here is that cache entries are fixed size (12h) data fragments that are stored when last server request was done.…