Many datasets are small enough to keep the full collection in memory. For some like accept and deny lists, this is crucial as a traditional cache would need to save negative results as well. Others such as per-client overrides might be small enough to store in configs, but keeping them there necessitates a deploy when they change. Storing them in a database would again require caching negative results.
Both cases result in caches often larger than the dataset backing them, while producing a bimodal latency distribution as some calls require an external lookup. This library presents a set, map, or user-defined interface over a structure guaranteed to hold the full backing dataset in memory, with a background thread polling the backend for changes and atomically swapping them in. Should the backing store become unavailable, operation can continue as normal, albeit with stale values.
This library is best suited to problems where the underlying data changes infrequently and there is a tolerance for slightly stale values, similar to any caching use-case.
//TODO: - Arbitrary https client support
//TODO: - A separate project: A proxy server that allows only a few instances to maintain data
//TODO: direct from the source and serves the stored data out for usage by service instances
//TODO: - Make as many configuration values as possible live-configurable, to enable wrapping
//TODO: one in another, updating configs of the primary caching service live.
Cache instances are constructed using a builder, which is retrieved by calling one of
MirrorCache::<UpdatingMap<$Version, $Key, $Value>>::map_builder()
,MirrorCache::<UpdatingSet<$Version, $Value>>::set_builder()
, orMirrorCache::<UpdatingObject<$Version, $Value>>::object_builder()
Depending on the desired collection type. Code won't compile if required fields are unset. See the appropriate section below for more details on each of the builder functions.
fn main() -> FullDatasetCache<UpdatingMap<K, V>> {
let source = LocalFileConfigSource::new("my.config");
let processor = RawLineMapProcessor::new(|line| { /* Parsing! */ });
MirrorCache::<UpdatingMap<VersionType, KeyType, ValueType>>::map_builder()
// These are required. Failing to specify any of these will cause type-checker errors.
.with_source(source)
.with_processor(processor)
.with_fetch_interval(Duration::from_secs(10))
// These are optional
.with_name("my-cache")
.with_fallback(Fallback::with_value(HashMap::new()))
.with_update_callback(OnUpdate::with_fn(|_, v, _| println!("Updated to version {}", v)))
.with_failure_callback(OnFailure::with_fn(|e, _| println!("Failed with error: {}", e)))
.with_metrics(ExampleMetrics::new())
.build().unwrap();
}
While users may implement their own, a number of sources are provided:
LocalFileConfigSource
exposes a file on the local file system, provided with core library.HttpConfigSource
wraps a reqwest client and fetches data over the network via HTTP(S). Requiresfeatures = ["http"]
.S3ConfigSource
exposes an object in S3. Requiresfeatures = ["s3"]
.GitHubConfigSource
exposes a file on GitHub. Requiresfeatures = ["github"]
.
Suggestions for other sources are welcome. Google Cloud Storage is not included due to a dependency conflict with the GitHub client used. Ideally, backends will support some get-if-newer functionality. Those that don't can still be used, but implementations will have to issue an unconditional fetch every time and care should be taken when choosing the fetch interval.
Two processors are provided in the core library, both consume a Reader and pass lines to a
specified parse function to be transformed into values or (key, value) tuples depending on
whether a set or map is being constructed. The set values and map keys must be Eq + Hash
,
and they as well as the map values must be Send + Sync
.
When implementing your parse function, remember to allow blank lines and comments. It's also
recommended to be forgiving about whitespace. It can return Ok(None)
to indicate that
nothing is wrong with the stream, the line in question just didn't translate to an entry in
the collection.
Name is only passed to the thread scheduler for use as a thread label component. Not present in async version.
Two callbacks may be specified, one to be invoked any time a new backing datasource state is swapped in, and one to be invoked any time a fetch or process fails. If nothing else, it's recommended to at least log the occurrence of any errors, but logging the fact that the dataset has updated can help with debugging.
The callback traits may be implemented directly, or OnUpdate::with_fn()
and
OnFailure::with_fn()
convenience methods are provided, both will accept a closure or
anything implementing the appropriate Fn
type.
In order to make useful guarantees, the cache must complete an initial fetch when build()
is invoked. By default, if that fetch fails, build()
will return an error. If a fallback
is provided, it will be used until a successful fetch can be completed. This can be
important, as a backing data source going unavailable can cause new service instances to
not come up if they just unwrap()
after build()
.
It's optional to collect metrics, but it's strongly recommended to do so if running this code
in production. Particular care should be given to last_successful_check()
, as it exposes how
stale the data might be. It's recommended to alert on this value if it exceeds tolerable
staleness.
See metrics.rs for other metrics that can be collected.
The following is a log of the provided example, edited with comments.
The example sets up a cache backed by a local file of key=value
pairs, where value
is an
i32, then loops forever printing the value of the key C
. It's very noisy, as the example
metrics implementation just calls println!()
.
Fallback invoked! #Initial fetch failed, fallback value of an empty map used
C=0 #Example defaults to 0 if key isn't in map
Last successful update is now at 2022-11-03 00:46:09.121312 UTC #Successful fetch completed
Update fetch took 0ms and process took 0ms
Updated to version 1667435391751
C=3
Last successful check is now at 2022-11-03 00:46:11.122168 UTC
File hasn't changed. Check in 0ms
C=3
Last successful update is now at 2022-11-03 00:46:17.126346 UTC #File updated with new value
Update fetch took 0ms and process took 0ms
Updated to version 1667436376483
C=2
Last successful check is now at 2022-11-03 00:46:21.124309 UTC
File hasn't changed. Check in 0ms
C=2
Process failed with: 'invalid digit found in string' #'2' in configs replaced with 'five'
Failed with error: invalid digit found in string
C=2 #Client code is still able to read the last known value
Last successful update is now at 2022-11-03 00:46:31.124433 UTC #Config fixed
Update fetch took 0ms and process took 0ms
Updated to version 1667436390340
C=5
Last successful check is now at 2022-11-03 00:46:33.125560 UTC
File hasn't changed. Check in 0ms
C=5
Fetch failed with: 'No such file or directory (os error 2)' #File moved or deleted
Failed with error: No such file or directory (os error 2)
C=5 #Client code is still able to read the last known value
Last successful update is now at 2022-11-03 00:46:45.125176 UTC #File restored
Update fetch took 0ms and process took 0ms
Updated to version 1667436404799
C=5