Lean event persister.
The project provides a lean event persister without the need for additional queues. Storage initially is file-based and supports local storage as well as AWS s3 and GCP gcs. It provides configuration options to both partition by group and file size and time buckets (such as hourly). Further, it provides additional "flush-criteria" that determine when a new event file is written to the storage. In case the service goes down, leftover event stores are attempted to be persisted, which should avoid data loss in situations such as dynamic scaling leading to node downing.
- POST:
event/[eventType]/[group]
- post events of formats as specified for the eventType[eventType]
: see config description forEVENT_ENDPOINT_STRUCTDEF_PAIRS
below. Each first element of all derived tuples of event type and structural definition specifies a single endpoint, that accepts only events matching the structural definition.[group]
: must either be a value as configured inVALID_GROUPS
(see below) or any value if the config value corresponding to the description below forVALID_GROUPS
is set to*
.
- GET:
metrics
- endpoint to be scraped by prometheus
NOTE: you will only be able to build the project if you locally publish
kolibri-storage
, which is to be found in the project
https://github.com/awagen/kolibri
. Release of the jar to public repo is planned.
In the meantime you can do so in the root-folder of the above kolibri-project with
sbt kolibri-storage/publishLocal
.
- recompile and test:
sbt clean test
- building jar within target/scala-2.13/:
sbt assembly
- creating docker image:
docker build . -t eyvent:0.0.1
Without needing to alter the config file, the configuration properties can be set via env variable. In this project you will find an example docker-compose.yml file. The below gives an overview of configuration properties.
ENV Variables - Optional configurations | Usage |
---|---|
NODE_HASH | (Optional) Every node needs a unique identifier, which will also be reflected in the file names of the files written by any particular node to avoid clashes (they are unlikely even without this since the event file names also carry the timestamp when it was created). If not set, will be generated randomly. |
HTTP_SERVER_PORT | (Optional) Port under which the service is started. Default: 6001. |
NON_BLOCKING_POOL_THREADS | (Optional) In case you want to deviate from the default ZIO settings, you can set the number of non-blocking threads used here. |
BLOCKING_POOL_THREADS | (Optional) In case you want to deviate from the default ZIO settings, you can set the number of blocking threads used here. |
STRUCT_DEF_SUB_FOLDER | (Optional) Here you can set the folder from which the structural definitions of allowed events (json files) are picked. Default: "eventStructDefs". Note that as all path this holds relative to the base path configured. |
EVENT_STORAGE_SUB_FOLDER | (Optional) Sub-folder (relative to the configured base folder) where the event partition directories are created in which the single event log files are persisted. Default: "events" |
PARTITIONING | (Optional) Json giving the definition of the partitioning to be applied. See NamingPatternJsonProtocol for allowed values. Default: {"type": "YEAR_MONTH_DAY_HOUR", "separator": "/"} (which causes partitioning to be applied by yyyy/mm/dd/hh). |
VALID_GROUPS | (Optional) Comma-separated values of groups allowed to use in the event post endpoint. If values contain * , this will cause for all groups to be accepted. Default: * |
EVENT_ENDPOINT_STRUCTDEF_PAIRS | (Optional) Comma-separated values where two sequential values are interpreted as pairs. The first value defines the type of the event endpoint (and will be used in the generated endpoints) and the second defines the name of the json file that specifies how an event is allowed to look like (e.g which fields are needed and of which type). For description of the format see below. Note that the configured json files are searched for in the folder defined in STRUCT_DEF_SUB_FOLDER . This effectively binds an endpoint type to the type of jsons it accepts. Default: store,simpleEvent1.json (where simpleEvent1.json can be found in the examples folder and is a very simplified example) |
MAX_FILE_SIZE_IN_MB | (Optional) Specifies the maximal file size in MB that any event log file should not exceed (note that the resulting size can slightly exceed this value, since the mechanism sums up the size and when the limit is met or exceeded flushes the file to storage). Default: 2. |
MAX_NUMBER_OF_EVENTS | (Optional) Specifies the maximal number of events that any event log file should not exceed. Default: 5000. |
Env Variables - Storage configuration | |
---|---|
PERSISTENCE_MODE | The persistence mode used. Can be: AWS (s3), GCP (gcs), LOCAL (local file system), RESOURCE (local resources), CLASS (if selected, need to define PERSISTENCE_MODULE_CLASS property, specifying fully qualified name to used persistence module class. |
PERSISTENCE_MODULE_CLASS | If PERSISTENCE_MODE is set to CLASS , need to set here the fully qualified name to used persistence module class, such as de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule (which happens to refer to the same persistence module as just specifying PERSISTENCE_MODE as LOCAL). |
AWS_PROFILE | If PERSISTENCE_MODE is AWS (or CLASS and the AWS module is referenced above), specify here the profile to use. |
AWS_S3_BUCKET | If AWS storage is used, define here the bucket to store tha state / result data in. |
AWS_S3_PATH | If AWS storage is used, define here the path within the above defined bucket to use as base path. |
AWS_S3_REGION | If AWS storage is used, define the region here. |
GCP_GS_BUCKET | If GCP storage is used, define here the bucket to store tha state / result data in. |
GCP_GS_PATH | If GCP storage is used, define here the path within the above defined bucket to use as base path. |
GCP_GS_PROJECT_ID | If GCP storage is used, define here the project id under which you created the bucket. |
LOCAL_STORAGE_WRITE_BASE_PATH | If LOCAL storage is used, define the base path here under which to store the data. |
LOCAL_STORAGE_READ_BASE_PATH | If LOCAL storage is used, define the base path here from which data is read (should usually be the same as the write base path). |
First of all, you need to define the format of the expected events for each endpoint
that accepts event messages.
The format is given by the json format of the distinct fields as represented by
StructDef
instances (see JsonStructDefs
for the definitions and JsonStructDefsJsonProtocol
for the json format of these definitions). Here we will give a short tour on the format:
- the StructDef information is represented as json object
- the
type
of the top-level element isNESTED
, which refers to the fact that the expected event is itself a json with single fields in it. - since the top-level element is of type
NESTED
, it has additional attributesfields
(the fields that do not depend on any other set field) andconditionalFieldsSeq
, which represents a sequence of fields that depend on the value of any of the fields specified infields
. - within an element of type
NESTED
, each single field has the following attributes:nameFormat
that defines how the key value needs to look (in case of a constant identifier, which will be the most common case, it would be of typeSTRING_CONSTANT
)valueFormat
that defines how the value for the key specified bynameFormat
is supposed to look (e.g which type, which restrictions).required
: boolean flag that defines whether the field must be set of can be left out. Note that validations on a field where required=false is only applied if the value is set. Leaving it out altogether counts as valid.
Now the attributes that go into nameFormat
and valueFormat
refer to the same set of
value definitions, and the attributes to set depend on the value of the respective type
attribute. Let's see what we have here:
Type | Description | Fields / Examples |
---|---|---|
INT | Any integer value. | {"type": "INT"} |
CHOICE_INT | Any of a selection of integer values. | {"type": "CHOICE_INT", "choices": [0, 1, 2]} |
MIN_MAX_INT | Any integer within [min, max]. | {"type": "MIN_MAX_INT", "min": 0, "max": 5} |
STRING_CONSTANT | String of exactly the value defined by value attribute. |
{"type": "STRING_CONSTANT", "value": "const1"} |
STRING | Any string. | {"type": "STRING"} |
CHOICE_STRING | Any of a selection of string values. | {"type": "CHOICE_STRING", "choices": ["str1", "str2"]} |
REGEX | Any string matching the regex given by the regex attribute. |
{"type": "REGEX", "regex": ".*"} |
FLOAT | Any float. | {"type": "FLOAT"} |
CHOICE_FLOAT | Any of a selection of float values. | {"type": "CHOICE_FLOAT", "choices": [0.4, 0.5, 0.6]} |
MIN_MAX_FLOAT | Any float within [min, max]. | {"type": "MIN_MAX_FLOAT", "min": 0.1, "max": 0.5} |
DOUBLE | Any double. | {"type": "DOUBLE"} |
CHOICE_DOUBLE | Any of a selection of double values. | {"type": "CHOICE_DOUBLE", "choices": [0.4, 0.5, 0.6]} |
MIN_MAX_DOUBLE | Any double within [min, max]. | {"type": "MIN_MAX_DOUBLE", "min": 0.1, "max": 0.5} |
BOOLEAN | Any boolean. | {"type": "BOOLEAN"} |
EITHER_OF | Any value matching any of the formats given by the attribute formats . |
{"type": "EITHER_OF", "formats": [{"type": "STRING"}, {"type": "DOUBLE"}]} |
Type | Description | Fields / Examples |
---|---|---|
INT_SEQ | Sequence of any integers. | {"type": "INT_SEQ"} |
SEQ_CHOICE_INT | Sequence where each element is one of the given integer choices. | {"type": "SEQ_CHOICE_INT", "choices": [0, 1, 2]} |
SEQ_CHOICE_FLOAT | Sequence where each element is one of the given float choices. | {"type": "SEQ_CHOICE_FLOAT", "choices": [0.1, 0.2, 1.2]} |
SEQ_CHOICE_DOUBLE | Sequence where each element is one of the given double choices. | {"type": "SEQ_CHOICE_DOUBLE", "choices": [0.1, 0.2, 1.2]} |
STRING_SEQ | Sequence of any strings. | {"type": "STRING_SEQ"} |
SEQ_CHOICE_STRING | Sequence where each element is one of the given string choices. | {"type": "SEQ_CHOICE_STRING", "choices": ["str1", "str2"]} |
SEQ_REGEX | Sequence where each element is a string matching the given regex. | {"type": "SEQ_REGEX", "regex": ".*"} |
GENERIC_SEQ_FORMAT | Sequence where each element is one of the values given by perElementFormat . |
{"type": "GENERIC_SEQ_FORMAT", "perElementFormat": {"type": "INT"}} |
SEQ_MIN_MAX_INT | Sequence where each element is an integer within [min, max]. | {"type": "SEQ_MIN_MAX_INT", "min": 4, "max": 10} |
SEQ_MIN_MAX_FLOAT | Sequence where each element is an float within [min, max]. | {"type": "SEQ_MIN_MAX_FLOAT", "min": 0.4, "max": 10} |
SEQ_MIN_MAX_DOUBLE | Sequence where each element is an double within [min, max]. | {"type": "SEQ_MIN_MAX_DOUBLE", "min": 0.4, "max": 10} |
Type | Description | Fields |
---|---|---|
NESTED | Specifies the format of all unconditional fields and mappings for specific values of selected unconditional fields to conditional fields. This specifies a direct dependency of the value selected for the unconditional fields and the respective conditional fields. | {"type": "NESTED", "fields": [{"nameFormat": {"type": "STRING_CONSTANT", "value": "field1"}, "valueFormat": {"type": "REGEX", "regex": ".*"}}], "conditionalFieldsSeq": [{"conditionalFieldId": "field1", "mapping": {"value1": [{"nameFormat": {"type": "STRING_CONSTANT", "value": "condField1"}, "required": true, "valueFormat": {"type": "INT"}}]}]} |
MAP | Specifies formats for the keys and values in a map. | {"type": "MAP", "keyFormat": {"type": "STRING"}, "valueFormat": {"type": "INT"}} |
The service tries to persist all leftover event stores when app is shut down via .onExit
hook.
This should avoid loosing any events.