-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.json
1 lines (1 loc) · 202 KB
/
index.json
1
[{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/0-overview/","title":"Overview","tags":[],"description":"","content":"Overview The sketch below shows the high-level flow of processing within Kolibri.\nDefine the samples to operate on either via OrderedMultiValues (composed of OrderedValues) and use provided implicit conversions to create IndexedGenerator or create IndexedGenerator directly. Use batching strategy to split in single batches that are processed through computations defined in RunnableGraph. Each data sample results in a tagged ProcessingMessage[T], which is handled by AggregatingActor (created by RunnableExecutionActor that runs the RunnableGraph). If the expectation in the AggregatingActor is fulfilled or failed, write partial result if writer is defined, and provide AggregationState[V] upstream. Dependent on flag whether full results shall be passed upstream after execution, this state will either just indicate finished execution or additionally provide the aggregation. The following provides an overview of the distinct steps to define computations and the data structures involved.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/1-values/","title":"Values","tags":[],"description":"","content":"Values In the following let\u0026rsquo;s look into which structures Kolibri provides to simplify definitions of values.\nOrderedValues trait OrderedValues[+T] extends KolibriSerializable { val name: String val totalValueCount: Int def getNthZeroBased(n: Int): Option[T] def getNFromPositionZeroBased(position: Int, n: Int): Seq[T] def getAll: Seq[T] } Two distinct implementations:\nUsing explicitly passed values: case class DistinctValues[+T](name: String, values: Seq[T]) extends OrderedValues[T] Range with defined start, end and stepSize case class RangeValues[T:Fractional](name: String, start:T, end:T, stepSize:T)(implicit v: Numeric[T]) extends OrderedValues[T] OrderedMultiValues Container for multiple OrderedValues with methods to edit (remove, add) values and methods to retrieve the n-th element out of the permutation over all contained OrderedValues.\ntrait OrderedMultiValues extends KolibriSerializable { def values: Seq[OrderedValues[Any]] def removeValue(valueName: String): (OrderedMultiValues, Boolean) def originalValueIndexOf(n: Int): Int = n def addValue(values: OrderedValues[Any], prepend: Boolean): OrderedMultiValues def addValues(values: Seq[OrderedValues[Any]], prepend: Boolean): OrderedMultiValues def addValues(values: OrderedMultiValues, prepend: Boolean): OrderedMultiValues def stepsForNthElementStartingFromFirstParam(n: Int): List[(Int, Int)] def getParameterNameSequence: Seq[String] def numberOfCombinations: Int def findNthElement(n: Int): Option[Seq[Any]] def findNNextElementsFromPosition(startElement: Int, nrOfElements: Int): Seq[Seq[Any]] } /** * Implementation of OrderedMultiValues assuming a value grid, providing the methods to find n-th permutations * @param values - Seq of OrderedValues of any type */ case class GridOrderedMultiValues(values: Seq[OrderedValues[Any]]) extends OrderedMultiValues Batching /** * Trait adding batchSize, batchNr and batchStartElement (shift of indices relative to the original OrderedMultiValues, * specifying at which element of the original the batch starts) to OrderedMultiValues */ trait OrderedMultiValuesBatch extends OrderedMultiValues { val batchSize: Int val batchNr: Int val batchStartElement: Int override def originalValueIndexOf(n: Int): Int = { batchStartElement + n } } /** * Batch representing at most batchSize elements of its input values. Starts at position batchSize * (batchNr -1) + 1 * of the original values. * * @param multiValues The original values this batch represents a part of * @param batchSize The maximum size of this batch (might contain less for last batch of values) * @param batchNr The number of this batch. Used to determine which part of the values this batch represents. 1-based */ case class GridOrderedMultiValuesBatch(multiValues: OrderedMultiValues, batchSize: Int, batchNr: Int) extends OrderedMultiValuesBatch Implicit Conversions Further conversions are defined within OrderedMultiValuesImplicits class via usage of an implicit class. On import this will make the functions directly available on any instance of type OrderedMultiValues.\nThose are (for complete list see the mentioned class):\n/** * from OrderedMultiValues generate iterator over sequence of named values (_._1 is name of parameter, _._2 is value) */ def toNamedParamValuesIterator: Iterator[Seq[(String, Any)]] = new Iterator[Seq[(String, Any)]] /** * from OrderedMultiValues generate iterator over Map of key , values pairs, where key is the name of * the parameter in OrderedMultiValues, values is Seq over all values (e.g Seq in case any parameter name * is contained multiple times, as might be the case e.g for non-unique url parameters) * * @return */ def toParamNameValuesMapIterator: Iterator[Map[String, Seq[Any]]] = new Iterator[Map[String, Seq[Any]]] /** * from OrderedMultiValues generate iterator over sequence of values (the corresponding parameter name * for a value would be given by the element with the same index as the respective value in the * multiValues.getParameterNameSequence sequence of parameter names * * @return */ def toParamValuesIterator: Iterator[Seq[Any]] = new Iterator[Seq[Any]] /** * transform OrderedMultiValues instance to IndexedGenerator of value sequence (without parameter names) * * @return */ def toParamValuesIndexedGenerator: IndexedGenerator[Seq[Any]] = new IndexedGenerator[Seq[Any]] /** * transform OrderedMultiValues instance to IndexedGenerator mappings of parameter name * to values for that parameter * * @return */ def toParamNameValuesMapIndexedGenerator: IndexedGenerator[Map[String, Seq[Any]]] /** * transform OrderedMultiValues instance to IndexedGenerator Seq of (parameterName, parameterValue) pairs * * @return */ def toNamedParamValuesIndexedGenerator: IndexedGenerator[Seq[(String, Any)]] Data Samples and Aggregations DataPoint trait DataPoint[+T] extends KolibriSerializable { def weight: Double def data: T } case class DataSample[+T](weight: Double, data: T) extends DataPoint[T] AggregateValues The AggregateValue trait provides a general interface for values that are generated using multiple single values. Keeps record of the number of samples used to calculate the value, the weight of the value, functions to add other values and a function to retrieve the weighted version of the AggregateValue.\ntrait AggregateValue[A] extends KolibriSerializable { def numSamples: Int def weight: Double def value: A def weighted(weight: Double): AggregateValue[A] def add(other: AggregateValue[A]): AggregateValue[A] def add(otherValue: DataPoint[A]): AggregateValue[A] } Implementation of AggregateValue providing methods to combine separate values.\ncase class RunningValue[A](weight: Double, numSamples: Int, value: A, weightFunction: (Double, Double) =\u0026gt; Double, addFunc: (AggregateValue[A], AggregateValue[A]) =\u0026gt; A) extends AggregateValue[A] Running value of two distinct types, e.g can be used to record occurring errors and successful computation values in a single record, e.g in case your computation returns Either[SomeFailType, SomeComputationValue] or similar settings where two values are in some way connected. AggregateValue keeps the count of samples aggregated and the current value of the aggregation\ncase class BiRunningValue[A, B](value1: AggregateValue[A], value2: AggregateValue[B]) Metric Representations A MetricValue provides a simple container keeping state with BiRunningValue that keeps track of occurring error types and the respective counts and some aggregated value type to keep track of the successful computations aggregated in the MetricValue\ncase class MetricValue[A](name: String, biValue: BiRunningValue[Map[ComputeFailReason, Int], A]) Helper functions are provided in the MetricValue object to create frequently used values:\ndef createAvgFailSample(metricName: String, failMap: Map[ComputeFailReason, Int]): MetricValue[Double] def createAvgSuccessSample(metricName: String, value: Double, weight: Double): MetricValue[Double] def createEmptyAveragingMetricValue(name: String): MetricValue[Double] Metrics are represented by MetricRecord instances.\ntrait MetricRecord[A, B] { def getMetricsValue(key: A): Option[MetricValue[B]] def addMetricDontChangeCountStore(metric: MetricValue[B]): MetricRecord[A, B] def addFullMetricsSampleAndIncreaseSampleCount(metrics: MetricValue[B]*): MetricRecord[A, B] def addRecordAndIncreaseSampleCount(record: MetricRecord[A, B]): MetricRecord[A, B] def metricNames: Seq[A] def metricValues: Seq[MetricValue[B]] def containsMetric(key: A): Boolean } A metric row is identified by set of parameters and metric values that hold for the parameters.\ncase class MetricRow(countStore: ResultCountStore, params: Map[String, Seq[String]], metrics: Map[String, MetricValue[Double]]) extends MetricRecord[String, Double] The MetricRow companion object provides multiple helper methods for easier composition.\nMetricDocument represents a map of parameter set to MetricRow. Implementation uses a mutable map. A single document will only be modified within a single actor, thus single thread at a time.\ncase class MetricDocument[A \u0026lt;: AnyRef](id: A, rows: mutable.Map[ParamMap, MetricRow]) Usually Tag type would be used as type A, to group the results based on some grouping criteria. It comes with methods such as to add other results or generate a weighted copy.\nMetric Aggregation /** * MetricAggregation that keeps track of full MetricDocuments for keys of defined type. * Each key stands for a separate aggregation, which can be used for selectively aggregating subsets of results * * @param aggregationStateMap - map with key = key of defined type A, value = MetricDocument, which maps a ParamMap to * a MetricRow, which carries all relevant parameters and corresponding metrics * @param keyMapFunction - optional function to map result keys to before adding to aggregation. E.g can be used * in case all incoming results shall only be aggregated under a single \u0026#34;ALL\u0026#34; * aggregation instead of keeping track of distinct results per key * @tparam A - type of the keys that describe the aggregation groups */ case class MetricAggregation[A \u0026lt;: AnyRef](aggregationStateMap: mutable.Map[A, MetricDocument[A]] = mutable.Map.empty[A, MetricDocument[A]], keyMapFunction: SerializableFunction1[A, A] = identity) extends WithCount Aggregators The aggregation of partial results is provided by an instance of type Aggregator\nabstract class Aggregator[-U: TypeTag, V: TypeTag] extends KolibriSerializable { def add(sample: U): Unit def aggregation: V def addAggregate(aggregatedValue: V): Unit } See Aggregators class for Aggregators object containing distinct aggregator types. Some examples are given below:\n/** * Aggregator taking start value generator, aggregation and merge functions to define the aggregation behavior * @param aggFunc - Given sample of type U and aggregation of type V, generate new value of type V * @param startValueGen - supplier of start value of aggregation, type V * @param mergeFunc - merge function of aggregations, taking two values of type V and providing new value of type V * @tparam U - type of single data points * @tparam V - type of aggregation */ class BaseAggregator[U: TypeTag, V: TypeTag](aggFunc: SerializableFunction2[U, V, V], startValueGen: SerializableSupplier[V], mergeFunc: SerializableFunction2[V, V, V]) extends Aggregator[U, V] /** * * @param aggFunc - aggregation function, taking single data point of type TT, aggregated value of type V, providing new aggregation of type V * @param startValueForKey - function giving an initial aggregation value, given a Tag * @param mergeFunc - merge function of aggregation values * @param keyMapFunction - function mapping value of type Tag to value of type Tag (in case the Tag shall not be mapped, just use identity) * @tparam TT - type of single data point, needs to extend TaggedWithType * @tparam V - type of aggregation */ class BasePerClassAggregator[TT \u0026lt;: TaggedWithType : TypeTag, V: TypeTag](aggFunc: SerializableFunction2[TT, V, V], startValueForKey: SerializableFunction1[Tag, V], mergeFunc: SerializableFunction2[V, V, V], keyMapFunction: SerializableFunction1[Tag, Tag]) extends Aggregator[TT, Map[Tag, V]] /** * Aggregator that aggregates (running) averages per class * @param keyMapFunction - function mapping value of type Tag to value of type Tag (in case the Tag shall not be mapped, just use identity) */ class TagKeyRunningDoubleAvgPerClassAggregator(keyMapFunction: SerializableFunction1[Tag, Tag]) extends BasePerClassAggregator[TaggedWithType with DataPoint[Double], AggregateValue[Double]]( aggFunc = (x, y) =\u0026gt; y.add(x), startValueForKey = _ =\u0026gt; doubleAvgRunningValue(weightedCount = 0.0, count = 0, value = 0.0), mergeFunc = (x, y) =\u0026gt; x.add(y), keyMapFunction) { } /** * Aggregator aggregating to (running) averages overall */ class TagKeyRunningDoubleAvgAggregator() extends BaseAggregator[DataPoint[Double], AggregateValue[Double]]( aggFunc = (x, y) =\u0026gt; y.add(x), startValueGen = () =\u0026gt; doubleAvgRunningValue(weightedCount = 0.0, count = 0, value = 0.0), mergeFunc = (x, y) =\u0026gt; x.add(y)) { } /** * In case of a mapping function that alters original tags, ignoreIdDiff would need to be true to avoid conflicts. * Setting this attribute to true enables aggregating data for the original tag to data for the mapped tag. * * @param keyMapFunction - mapping function of Tag of input sample data * @param ignoreIdDiff - determines whether merging aggregations for different IDs is allowed */ class TagKeyMetricDocumentPerClassAggregator(keyMapFunction: SerializableFunction1[Tag, Tag], ignoreIdDiff: Boolean = false) extends BasePerClassAggregator[TaggedWithType with DataPoint[MetricRow], MetricDocument[Tag]]( aggFunc = (x, y) =\u0026gt; { y.add(x.data) y }, startValueForKey = x =\u0026gt; MetricDocument.empty[Tag](x), mergeFunc = (x, y) =\u0026gt; { x.add(y, ignoreIdDiff = ignoreIdDiff) x }, keyMapFunction) {} /** * In case of a mapping function that alters original tags, ignoreIdDiff would need to be true to avoid conflicts. * Setting this attribute to true enables aggregating data for the original tag to data for the mapped tag. * * @param keyMapFunction - mapping function of Tag of input sample data * @param ignoreIdDiff - determines whether merging aggregations for different IDs is allowed */ class TagKeyMetricAggregationPerClassAggregator(keyMapFunction: SerializableFunction1[Tag, Tag], ignoreIdDiff: Boolean = false) extends Aggregator[TaggedWithType with DataPoint[MetricRow], MetricAggregation[Tag]] /** * Wrapper for typed aggregators to accept any message and aggregate only those matching the type * * @param aggregator * @tparam T * @tparam V */ case class BaseAnyAggregator[T: TypeTag, V: TypeTag](aggregator: Aggregator[T, V]) extends Aggregator[Any, V] Typed Maps Typed maps make assumptions on the type of the values, and thus allow usage of map structure for distinct types of values with type guarantee. The assumptions are either baked into the key values or to be specified in the get call to retrieve the values.\nDistinct implementations are provided, which mainly vary on the dimensions mutable/immutable and strong/weak typing. The strongly typed maps make use of TypeTag, causing use of reflection, and thus might slow down executions if evoked frequently. Weakly typed maps will return empty result (None) on get if the value cannot be cast to the expected type.\nStrong typing (available as mutable or immutable implementation, refer to implementation for details):\ntrait TypeTaggedMap extends KolibriSerializable { def isOfType[T: TypeTag](data: T, typeInstance: Type): Boolean = { typeOf[T] =:= typeInstance } def put[T: TypeTag, V](key: ClassTyped[V], value: T): (Option[T], TypeTaggedMap) def remove[T](key: ClassTyped[T]): (Option[T], TypeTaggedMap) def get[V](key: ClassTyped[V]): Option[V] def keys: Iterable[ClassTyped[Any]] def keySet: collection.Set[ClassTyped[Any]] } Weakly typed (refer to implementation for details):\ntrait WeaklyTypedMap[T] extends KolibriSerializable { def put[U](key: T, value: U): Unit def remove(key: String): Option[Any] def get[U](key: T): Option[U] def keys: Iterable[T] def keySet: collection.Set[T] } Other useful structures PriorityStores Given some ordering keeps n top elements, utilizing priority queue. Elements can be added continuously and queue state will be updated.\n/** * Priority store backed by PriorityQueue. Allows adding of elements and by some ordering criterium * only keep n elements that are the first according to the ordering */ abstract class PriorityStore[T, U] { def keep_n: Int def ordering: Ordering[U] def elementToKey: U =\u0026gt; T def queueReversed: mutable.PriorityQueue[U] def addEntry(entry: U): Unit } case class BasePriorityStore[T, U](keep_n: Int, ordering: Ordering[U], elementToKey: U =\u0026gt; T) extends PriorityStore[T, U] AtomicMapPromiseStore Thread-safe store that uses first request for resource to load the data. This can involve more time-consuming loads, as each retrieve call is answered by a Promise, that gets fulfilled as soon as resource loading is finished.\n/** * Implementation ensuring thread safety of the value storage and also ensuring * that no more than one request leads to the creation of the stored resource which * could potentially be expensive (e.g in case multiple experiment batch processing * actors on a single node try to request the ressource at once) * E.g used to load some data expensive to load within an object to have only one data-instance per node * * @tparam U - the key used to identify a value * @tparam V - the corresponding value */ trait AtomicMapPromiseStore[U,V] "},{"uri":"http://awagen.github.io/kolibri/2-config-details/4-file-formats/1-judgements/","title":"Judgement Files","tags":[],"description":"","content":"COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/","title":"Kolibri Documentation","tags":[],"description":"","content":" Kolibri - The Execution Engine that loves E-Commerce Search\nKolibri is the german word for hummingbird. I picked it as project name to reflect the general aim to do many smaller things fast. And this describes the batch processing logic still quite well.\nBuilt in Scala, based on the ZIO framework, Kolibri provides easy-to-use mechanisms to compose computing tasks, define how to batch the tasks, do grouped aggregations and represent these aspects in such a way that the jobs can easily be distributed among the worker-nodes, which do not need to be tightly coupled in a cluster setting, but perform all necessary synchronizations via the used storage. The worker nodes do not need to live in the same environment, so whether it is purely local testing on one machine, testing with colleagues on a combination of local machines, cloud deployments or a mixture thereof - the only thing that matters is that the nodes have read and write access to the storage. This could either be a file-based such as local file system, s3, gcs, or any other cloud storage that simulates a file system or databases such as redis. Right now only local file system and s3 are implemented, more are expected to follow.\nFurther, while the framework is usable for all kinds of computations, its main focus in terms of pre-defined jobs is on e-commerce search related functionality to ease evaluation of results. For this certain tasks are already implemented, such as:\nFlexible definition of permutations, including the possibility to restrict the range of values based on the value of another parameter, restricting grid-search computations to the actually needed set of combinations. Changeable parameters in requests include all aspects, such as url parameters, headers, body. Requesting a service via http/https and parsing needed fields out of it, providing easy-to-define syntax for what to parse instead of needing to add any parsing logic yourself. Judgement-list based calculation of common information retrieval metrics (such as DCG, NDCG, ERR, Precision, Recall) Requesting and comparing distinct search systems in terms of result overlap (jaccard distance). Computing the results with attached tags, where partial results are written on the go instead of waiting for the whole. Aggregations of partial results with the option to apply a weight to different results (such as weighting down results belonging to lower-traffic queries) Why the \u0026hellip; should you use this?\nYou might want to consider Kolibri if you:\nNeed a lean mechanism to process many samples but do not want to deploy any additional setup such as queues or databases (using them might become an option at some point, but you won\u0026rsquo;t need them) Need a convenient mechanism to define permutations of parameters, including making values conditional on the value sampled for another parameter, to effectively limit the computation needed for grid-search. Want a UI coming with it that gives you control over the tooling Want to try it out from local, from many machines in a group of co-workers or in a combination of cloud-deployed machines and local machines. There is no distinction made between where a machine sits, as long as it has access to the resources needed for computation and the configured storage over which state information is synced. Are working in search / e-commerce search and do not want to write the Nth framework for evaluation of search system results, such as distributions of result attribute values as in STRING_SEQUENCE_VALUE_OCCURRENCE_HISTOGRAM or comparing results of two search systems with a JACCARD metric, or information retrieval metrics such as DCG, NDCG, PRECISION, RECALL, ERR, general metrics such as IDENTITY (when extracting single attributes, such as numDocs), FIRST_TRUE, FIRST_FALSE, TRUE_COUNT, FALSE_COUNT, BINARY_PRECISION_TRUE_AS_YES, BINARY_PRECISION_FALSE_AS_YES also want convenient support in specifying experiments, by defining the target system, permutations of different types of request modifiers (allowing adjustment of url parameters, headers, body), the fields that need to be parsed from the responses and what to do with them (e.g which metrics to calculate) also need tagging/grouping of partial results want to aggregate partial results for an overall look at the data and be able to apply distinct weights per result group (such as weighting down the results of less frequent queries) need a nice visualization of results (pending to be activated and extended now that the rewrite is complete) Are not working in the above field but need a service that conveniently allows definition of all kinds of processing steps to define workflows. \u0026hellip; jeah, just use it! Schematic overview: A short description of the single libraries is given in the following. In the further sub-sections of this documentation you will find step-by-step descriptions on how to use kolibri.\nKolibri DataTypes This library contains basic datatypes to simplify common tasks in batch processing and async state keeping.\nKolibri-DataTypes on Github\nKolibri Storage This library contains the storage implementations, such as file-based local disc, cloud-based such as AWS and GCP, and will likely contain non-file-based implementations such as redis at some point.\nKolibri-Storage on Github\nKolibri Definitions Contains the actual job definitions without the actual execution mechanism, to provide the parts that can be utilized / processed using in a respective service such as kolibri-fleet-zio.\nKolibri-Fleet-ZIO on Github\nKolibri Fleet ZIO Kolibri Fleet ZIO provides a multi-node batch execution setup. Batch definitions are flexible and make use of Akka-Streams, allowing the definition of flexible execution flows. Results are aggregated per batch and on demand aggregated to an overall result.\nFeatures include:\nStorage-based task queue, no need for additional deployments of queue system or databases. Synchronization of nodes via storage, thus it does not matter where nodes are located. They only need access to the selected storage. No need for direct node-to-node connections / tight clustering. This means you can just spin up your own local machine or several local machines in your office or spin up multiple nodes in the cloud or run all of them at the same time. In case some job shall not be computed using the resources of all connected nodes, this can be achieved by modifying the stored directive (that either indicates \u0026ldquo;processing for all\u0026rdquo;, \u0026ldquo;only process on node X\u0026rdquo;, \u0026ldquo;stop processing\u0026rdquo; and similar directives) Easy definition of datasets / permutations / tags / groups Mechanisms to split those sets into smaller batches Storage-based negotiation logic of single nodes to claim rights to process batches / single actions (such as cleaning up after a node that went offline) including state handling and collection of partial results Use case job definitions include:\nSearch parameter grid evaluation with flexible tagging based on request (e.g by request parameter), result (e.g size of result set, other characteristics of the search response) or actually derived metrics (given by a MetricRow object) result. Tagging allows separation into distinct aggregations based on the concept a tag represents. Kolibri-Fleet-ZIO on Github\nKolibri-Fleet-ZIO on DockerHub\nKolibri Watch (UI) Kolibri Watch provides a UI for the Kolibri project, allowing monitoring of job execution progress, definition of the executions and submission to the Kolibri backend for execution.\nKolibri-Watch on Github\nKolibri-Watch on DockerHub\nResponse Juggler Response Juggler is only listed here as mock service for trying out kolibri. You will find an example configuration in the docker-compose file in the root of the kolibri-project. It is used there to mimick a search service by sampling results that are then processed according to the job definition posted via kolibri-fleet-zio. This way you can directly out of the box try out whether handling of a specific response format is correct and/or execute benchmarks. It allows definition of a main template that defines how a response is supposed to look and definitions of the placeholders (enclosed in {{ and }}), for example:\n{ \u0026#34;response\u0026#34;: { \u0026#34;docs\u0026#34;: {{DOCS}}, \u0026#34;numFound\u0026#34;: {{NUM_FOUND}} } } The placeholders need to have a sampling mechanism configured. For example {{DOCS}} could be configured to map to a partial-file that defines the following json format:\n{ \u0026#34;product_id\u0026#34;: {{PID}}, \u0026#34;bool\u0026#34;: {{BOOL}}, \u0026#34;string_val_1\u0026#34;: {{STRING_VAL1}} } Since this last json snippet is already at the leave level and no place-holder comes below, we can configure specific values for them instead of other json values with other placeholders. A specific value sampling definition can look similar to this one (values to left just refer to set environment valiables, where the right side describes the set value for that variable):\nRESPONSE_FIELD_IDENT_STRING_VAL1: \u0026#34;{{STRING_VAL1}}\u0026#34; RESPONSE_FIELD_SAMPLER_TYPE_STRING_VAL1: \u0026#34;SINGLE\u0026#34; RESPONSE_FIELD_SAMPLER_ELEMENT_CAST_STRING_VAL1: \u0026#34;STRING\u0026#34; RESPONSE_FIELD_SAMPLER_SELECTION_STRING_VAL1: \u0026#34;p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19\u0026#34; The above specifies that the placeholder {{STRING_VAL1}} corresponds to a single (SINGLE) value, which is of type string (STRING) and is samples out of a comma-separated selection among the values p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19.\nFor further information on usage, refer to below reference to the project page.\nResponse-Juggler on Github\nResponse-Juggler on DockerHub\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/1-resourcedirectives/","title":"Resource Directives","tags":[],"description":"","content":"Resource directives describe a resource, provide instructions how to load them and assign the resource an identifier such that it can be referenced and loaded. In the job definitions they are used to load resources upfront that are big or repeatedly accessed, such as judgement lists.\nNote: the supplier option VALUES_FROM_NODE_STORAGE is usually not valid since it assumes that a resource with the defined identifier has already been loaded when this resource is accessed, which might not be the case.\nResource Directive Types JUDGEMENT_PROVIDER Judgement provider resource. A judgement provider is used to retrieve judgement values for a query-productId pair. MAP_STRING_TO_DOUBLE_VALUE Map[String, Double] resource. MAP_STRING_TO_STRING_VALUES Map[String, List[String]] resource. _STRING_VALUES _ List[String] resource. Let\u0026rsquo;s walk over the provided load options per type.\nJUDGEMENT_PROVIDER Load Option JUDGEMENTS_FROM_FILE Provide a file path (relative to the configured base path) from which judgements are loaded. Note that the format (CSV or JSON_LINES) depends on your configuration (see docker-compose documentation). For formats of the different types, refer to the file-formats section of the documentation. VALUES_FROM_NODE_STORAGE Provide identifier that refers to a resource of type JUDGEMENT_PROVIDER. Not recommended, since this would assume the resource this resource refers to would need to be already loaded when this resource is accessed. MAP_STRING_TO_DOUBLE_VALUE The options here are restricted right now and thus right now should not be used. Other ways of defining resources of this type are planned.\nLoad Option VALUES_FROM_NODE_STORAGE Provide identifier that refers to a resource of type MAP_STRING_TO_DOUBLE_VALUE. Not recommended, since this would assume the resource this resource refers to would need to be already loaded when this resource is accessed. MAP_STRING_TO_STRING_VALUE Load Option JSON_VALUES_MAPPING_TYPE Selecting this will provide the option to enter key values and assign one or multiple values to it. Thus this option means manual configuration of the key-value assignments. JSON_VALUES_FILES_MAPPING_TYPE Selecting this will provide the option to enter (key, value) pairs, where the value for each key (only single value) gives the path to a file that contains one single value per line (file path relative to configured base path). Thus on creation of the resource, all lines of the referenced files for each key will be read to compose all values for all keys. JSON_SINGLE_MAPPINGS_TYPE Set a file path (relative to the configured base path) to a json file containing single mappings, that is each key maps to a plain string value. JSON_ARRAY_MAPPINGS_TYPE Same as JSON_SINGLE_MAPPINGS_TYPE, but instead of plain string values contains a List of values per key. FILE_PREFIX_TO_FILE_LINES_TYPE Given a directory path (relative to the defined base path) and a file suffix, use the file names with suffix removed as keys and each line of the respective file as value. CSV_MAPPING_TYPE Given a file path (relative to configured base path), specify a csv file, the column delimiter, the index of the key column and the index of the value column. Will assign a List of values to each key value. Thus, if multiple rows assign a value to the same key, all those values will appear in the assigned values for that key. VALUES_FROM_NODE_STORAGE Provide identifier that refers to a resource of type MAP_STRING_TO_STRING_VALUE. Not recommended, since this would assume the resource this resource refers to would need to be already loaded when this resource is accessed. Example for supplier config for JSON_VALUES_FILES_MAPPING_TYPE:\n{ \u0026#34;type\u0026#34;: \u0026#34;JSON_VALUES_FILES_MAPPING_TYPE\u0026#34;, \u0026#34;values\u0026#34;: { \u0026#34;p1\u0026#34;: \u0026#34;data/parameters/values1.txt\u0026#34;, \u0026#34;p2\u0026#34;: \u0026#34;data/parameters/values2.txt\u0026#34; } } Example for supplier config for JSON_SINGLE_MAPPINGS_TYPE:\n{ \u0026#34;type\u0026#34;: \u0026#34;JSON_SINGLE_MAPPINGS_TYPE\u0026#34;, \u0026#34;values\u0026#34;: \u0026#34;data/parameters/singleValueMapping.json\u0026#34; } Example for supplier config for JSON_ARRAY_MAPPINGS_TYPE:\n{ \u0026#34;type\u0026#34;: \u0026#34;JSON_ARRAY_MAPPINGS_TYPE\u0026#34;, \u0026#34;values\u0026#34;: \u0026#34;data/parameters/multiValueMapping.json\u0026#34; } Example for supplier config for FILE_PREFIX_TO_FILE_LINES_TYPE:\n{ \u0026#34;type\u0026#34;: \u0026#34;FILE_PREFIX_TO_FILE_LINES_TYPE\u0026#34;, \u0026#34;directory\u0026#34;: \u0026#34;data/parameters/fileNameToValues\u0026#34;, \u0026#34;files_suffix\u0026#34;: \u0026#34;.txt\u0026#34; } Example for supplier config for CSV_MAPPING_TYPE:\n{ \u0026#34;type\u0026#34;: \u0026#34;CSV_MAPPING_TYPE\u0026#34;, \u0026#34;values\u0026#34;: \u0026#34;data/parameters/mapping.csv\u0026#34;, \u0026#34;column_delimiter\u0026#34;: \u0026#34;\\\\t\u0026#34;, \u0026#34;key_column_index\u0026#34;: 0, \u0026#34;value_column_index\u0026#34;: 1 } STRING_VALUES Define a sequence of values.\nLoad Option FROM_ORDERED_VALUES_TYPE Contains a range of options to define the data. For details see table below. PARAMETER_VALUES_TYPE Manually add each value. VALUES_FROM_NODE_STORAGE Provide identifier that refers to a resource of type MAP_STRING_TO_DOUBLE_VALUE. Not recommended, since this would assume the resource this resource refers to would need to be already loaded when this resource is accessed. FROM_ORDERED_VALUES_TYPE sources FROM_FILENAME_KEYS_TYPE Given a directory path (relative to the defined base path) and a file suffix, use the file names with removed suffix as values. FROM_FILES_LINES_TYPE Specify a file path (relative to the configured base path) from which each line is picked as a single value. FROM_VALUES_TYPE Manually define a sequence of values, same as the PARAMETER_VALUES_TYPE option above. FROM_RANGE_TYPE Define start, end and stepSize values. This will generate a sequence of floating point numbers. NOTE: in case you intend to use this as a *_REPLACE parameter (see request parameter section), take into account that in string replace a value would appear as floating point, not as integer. "},{"uri":"http://awagen.github.io/kolibri/2-config-details/1-task-definitions/","title":"Task Configurations","tags":[],"description":"","content":"Task Definition Types When you open kolibri-watch and navigate to the CREATE screen, you will see the following types of processing definitions offered:\nProcessing Definition Selection: What is the difference here?\nProcessing Definition Types TASK Provides a range of tasks the receiving node will directly execute, since they represent a single processing step (as opposed to jobDefinitions, which define batches). Current options are: AGGREGATE_FROM_DIR_BY_REGEX, AGGREGATE_FILES, AGGREGATE_GROUPS (see below for descriptions) JOB_SUMMARY Simply generates a summary over all available results for a job (partial or full aggregates) and persist it in the job\u0026rsquo;s result folder. As for a Task, this represent a single processing step and thus is processed directly by the node that receives the request. JOB_DEFINITION Definition of full jobs. Submitting a job definition will not directly start to process it, but you will find the job listed under OPEN JOBS in the STATUS page of the UI. To start processing, you need to press the START button. This will place a job-level directive into the job definition folder for the selected job, informing the nodes that they can start processing. In this section we will look at the tasks that consist of a single processing step, TASK and JOB_SUMMARY. We will look closer at JOB_DEFINITION in the next section.\nLet`s look at the variants in detail:\nTASK Type AGGREGATE_FROM_DIR_BY_REGEX:\nFields regex Regular expression applied on the file names in the defined folder. The files corresponding to the file names that match the expression are selected for aggregation. Example: .*[(]q=.+[)]-.* that would match file names such as (q=trousers)-asjhkh. outputFilename The file name under which the aggregation result shall be stored. readSubDir The directory from which to pick the files. Note that the paths are relative to the defined base path. Example: test-results/2023-08-09/testJob1 writeSubDir The directory in which the result is to be stored. Note that the paths are relative to the defined base path. Example: test-results/2023-08-09/testJob1 weightProvider By specifying the weight provider, we can assign different weights to different results. A possible use case is the down-weighting of results for queries that are relatively unimportant / occur rarely. The weightProvider can either be set to a constant value for equal weight for all results or to FROM_PER_QUERY_FILE:\nThis option requires you to specify a file with the configured query-weights based on csv format such as in the example below:\nq1 0.1 q2 0.4 q3 1.0 q4 0.3 q5 0.5 q6 0.8 Note that the removePrefix and removeSuffix options are provided to allow cleaning up the file name of each result to a value provided in the query weights file. Example: if results are in the format (q=q1)-abc1-dskdasjkh, here abc1 represents the node-hash, identifying the node who produced the result, and dskdasjkh is a random hash to avoid file override in case batching does not match the actual result tagging (e.g if tagging happens by query parameter and batching is such that each batch refers to multiple queries, in which case we would have multiple partial results for a single query). Important here is that both removePrefix and removeSuffix refer to the base result name, which in the above is the remaining part after removing both hash suffixes, thus (q=q1) in the example above. Thus a removePrefix of (q= and removeSuffix of ) would result in q1 as identifier, which matches a key in the above example csv weight file.\nAGGREGATE_FILES:\nHere we need to specify all files we would like to aggregate, while the other settings work analogue to the above example.\nAGGREGATE_GROUPS:\nHere we aggregate over defined groups and generate one result per group. Note that as of now, the groups need to refer to the file\u0026rsquo;s base name as described above, e.g for (q=q1)-abc1-dskdasjkh this would be (q=q1). For the group assignment we have no cleanup / normalization of this base name, but you can use it as described above for the definition of a weight provider.\nHere a group json looks as follows:\n{ \u0026#34;group1\u0026#34;: [ \u0026#34;(q=q1)\u0026#34;, \u0026#34;(q=q2)\u0026#34;, \u0026#34;(q=q3)\u0026#34;, \u0026#34;(q=q4)\u0026#34;, \u0026#34;(q=q5)\u0026#34;, \u0026#34;(q=q6)\u0026#34; ], \u0026#34;group2\u0026#34;: [ \u0026#34;(q=q7)\u0026#34;, \u0026#34;(q=q8)\u0026#34;, \u0026#34;(q=q9)\u0026#34;, \u0026#34;(q=q10)\u0026#34;, \u0026#34;(q=q11)\u0026#34;, \u0026#34;(q=q12)\u0026#34;, \u0026#34;(q=q13)\u0026#34;, \u0026#34;(q=q14)\u0026#34;, \u0026#34;(q=q15)\u0026#34;, \u0026#34;(q=q16)\u0026#34;, \u0026#34;(q=q17)\u0026#34;, \u0026#34;(q=q18)\u0026#34; ], \u0026#34;group3\u0026#34;: [ \u0026#34;(q=q19)\u0026#34;, \u0026#34;(q=q20)\u0026#34;, \u0026#34;(q=q21)\u0026#34;, \u0026#34;(q=q22)\u0026#34;, \u0026#34;(q=q23)\u0026#34;, \u0026#34;(q=q24)\u0026#34; ], \u0026#34;group4\u0026#34;: [ \u0026#34;(q=q1)\u0026#34;, \u0026#34;(q=q10)\u0026#34;, \u0026#34;(q=q12)\u0026#34;, \u0026#34;(q=q18)\u0026#34;, \u0026#34;(q=q20)\u0026#34;, \u0026#34;(q=q22)\u0026#34;, \u0026#34;(q=q24)\u0026#34; ] } Similarly, by selecting the FROM_JSON option instead of FROM_JSON_FILE, you can enter the group assignments manually:\nIn the above example configuration we specified an query weight file in the following format:\nq1 0.1 q2 0.4 q3 1.0 Thus with the given settings the weight keys would match. They do not if we do not provide removePrefix and removeSuffix. In this case we would have to change the key column format to:\n(q=q1) 0.1 (q=q2) 0.4 (q=q3) 1.0 After submitting and a few moments you should see files [groupName].csv for all specified group names.\nNOTE: there was a bug for the group aggregation up to version v0.2.4, yet it is fixed in the main-branch. The fix will be contained from release v0.2.5.\nJobSummary Type If you select this option, you get the choice for a range of dateIds for which results exist. After selecting one, a list of jobIds is shown. Select the jobId for which to create a summary, and submit the task. A few moments later you will find a summary for that job in the result folder for that job in a summary subfolder.\nA summary contains a range of information:\nthe estimated effect each parameter has on the result quality (calculated for all metrics for all available (partial or full) results) candidates for good / bad configs { \u0026#34;NDCG_10\u0026#34;: { \u0026#34;metric\u0026#34;: \u0026#34;NDCG_10\u0026#34;, \u0026#34;results\u0026#34;: { \u0026#34;((q=q7))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6855983189912663 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6750531211165659 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.40, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q1))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6779543908079728 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6673867087869434 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.7, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.1 } } }, \u0026#34;((q=q8))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6728646200189348 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6622830079624596 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.2 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q10))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6876636420598735 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6768444825159147 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.75, \u0026#34;q\u0026#34;: 1.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.9 } } }, \u0026#34;((q=q3))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.70330263096681 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.6932150559196751 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.010087575047134867, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q9))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a3\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7322057892429067 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7211842714934189 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01102, \u0026#34;k1\u0026#34;: 1.0, \u0026#34;q\u0026#34;: 0.1 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.2, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q4))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6533867258914473 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6432054161998944 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.013206512220618305, \u0026#34;k1\u0026#34;: 0.09, \u0026#34;q\u0026#34;: 0.8 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.1, \u0026#34;k1\u0026#34;: 0.2, \u0026#34;q\u0026#34;: 0.3 } } }, \u0026#34;((q=q6))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6589173727055805 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6449092328371309 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.9, \u0026#34;k1\u0026#34;: 0.8, \u0026#34;q\u0026#34;: 0.7 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.4, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q2))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6243645948746191 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6126035952414102 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01176099963320898, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q5))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.007013323687080963, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } } } }, \u0026#34;PRECISION_k=4\u0026amp;t=0.1\u0026#34;: { \u0026#34;metric\u0026#34;: \u0026#34;PRECISION_k=4\u0026amp;t=0.1\u0026#34;, \u0026#34;results\u0026#34;: { \u0026#34;((q=q7))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6855983189912663 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6750531211165659 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.40, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q1))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6779543908079728 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6673867087869434 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.7, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.1 } } }, \u0026#34;((q=q8))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6728646200189348 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6622830079624596 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.2 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q10))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6876636420598735 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6768444825159147 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.75, \u0026#34;q\u0026#34;: 1.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.9 } } }, \u0026#34;((q=q3))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.70330263096681 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.6932150559196751 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.010087575047134867, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q9))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a3\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7322057892429067 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7211842714934189 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01102, \u0026#34;k1\u0026#34;: 1.0, \u0026#34;q\u0026#34;: 0.1 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.2, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q4))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6533867258914473 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6432054161998944 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.013206512220618305, \u0026#34;k1\u0026#34;: 0.09, \u0026#34;q\u0026#34;: 0.8 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.1, \u0026#34;k1\u0026#34;: 0.2, \u0026#34;q\u0026#34;: 0.3 } } }, \u0026#34;((q=q6))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6589173727055805 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6449092328371309 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.9, \u0026#34;k1\u0026#34;: 0.8, \u0026#34;q\u0026#34;: 0.7 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.4, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q2))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6243645948746191 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6126035952414102 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01176099963320898, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q5))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.007013323687080963, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } } } }, \u0026#34;RECALL_k=4\u0026amp;t=0.1\u0026#34;: { \u0026#34;metric\u0026#34;: \u0026#34;RECALL_k=4\u0026amp;t=0.1\u0026#34;, \u0026#34;results\u0026#34;: { \u0026#34;((q=q7))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6855983189912663 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6750531211165659 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.40, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q1))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6779543908079728 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6673867087869434 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.7, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.1 } } }, \u0026#34;((q=q8))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6728646200189348 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6622830079624596 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.2 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q10))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6876636420598735 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6768444825159147 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.75, \u0026#34;q\u0026#34;: 1.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.9 } } }, \u0026#34;((q=q3))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.70330263096681 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.6932150559196751 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.010087575047134867, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q9))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a3\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7322057892429067 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7211842714934189 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01102, \u0026#34;k1\u0026#34;: 1.0, \u0026#34;q\u0026#34;: 0.1 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.2, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q4))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6533867258914473 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6432054161998944 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.013206512220618305, \u0026#34;k1\u0026#34;: 0.09, \u0026#34;q\u0026#34;: 0.8 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.1, \u0026#34;k1\u0026#34;: 0.2, \u0026#34;q\u0026#34;: 0.3 } } }, \u0026#34;((q=q6))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6589173727055805 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6449092328371309 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.9, \u0026#34;k1\u0026#34;: 0.8, \u0026#34;q\u0026#34;: 0.7 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.4, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q2))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6243645948746191 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6126035952414102 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01176099963320898, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q5))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.007013323687080963, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } } } }, \u0026#34;NDCG_4\u0026#34;: { \u0026#34;metric\u0026#34;: \u0026#34;NDCG_4\u0026#34;, \u0026#34;results\u0026#34;: { \u0026#34;((q=q7))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6855983189912663 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q7\u0026#34; ] }, 0.6750531211165659 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.40, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q1))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6779543908079728 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] }, 0.6673867087869434 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.7, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.1 } } }, \u0026#34;((q=q8))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a7\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6728646200189348 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q8\u0026#34; ] }, 0.6622830079624596 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.2 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q10))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a4\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6876636420598735 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q10\u0026#34; ] }, 0.6768444825159147 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.75, \u0026#34;q\u0026#34;: 1.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.9 } } }, \u0026#34;((q=q3))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.70330263096681 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, 0.6932150559196751 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.010087575047134867, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q9))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a3\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7322057892429067 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q9\u0026#34; ] }, 0.7211842714934189 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01102, \u0026#34;k1\u0026#34;: 1.0, \u0026#34;q\u0026#34;: 0.1 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.2, \u0026#34;k1\u0026#34;: 0.3, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q4))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a6\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6533867258914473 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, 0.6432054161998944 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.013206512220618305, \u0026#34;k1\u0026#34;: 0.09, \u0026#34;q\u0026#34;: 0.8 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.1, \u0026#34;k1\u0026#34;: 0.2, \u0026#34;q\u0026#34;: 0.3 } } }, \u0026#34;((q=q6))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a10\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6589173727055805 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q6\u0026#34; ] }, 0.6449092328371309 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.9, \u0026#34;k1\u0026#34;: 0.8, \u0026#34;q\u0026#34;: 0.7 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.4, \u0026#34;k1\u0026#34;: 0.5, \u0026#34;q\u0026#34;: 0.6 } } }, \u0026#34;((q=q2))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a9\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6243645948746191 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a1\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, 0.6126035952414102 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.01176099963320898, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } }, \u0026#34;((q=q5))\u0026#34;: { \u0026#34;bestAndWorstConfigs\u0026#34;: { \u0026#34;best\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ], \u0026#34;worst\u0026#34;: [ { \u0026#34;a\u0026#34;: [ \u0026#34;a5\u0026#34; ], \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, 0.6439163905223448 ] }, \u0026#34;parameterEffectEstimate\u0026#34;: { \u0026#34;maxMedianShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.007013323687080963, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 }, \u0026#34;maxSingleResultShift\u0026#34;: { \u0026#34;a\u0026#34;: 0.0, \u0026#34;k1\u0026#34;: 0.0, \u0026#34;q\u0026#34;: 0.0 } } } } } } "},{"uri":"http://awagen.github.io/kolibri/1-first-steps/1-getting-started/","title":"Configure / Run","tags":[],"description":"","content":"After checking out the repo, you will find an example docker-compose.yml file in it. It contains an example setup of prometheus, grafana, kolibri-fleet-zio instances, dummy search service instances (response juggler) and kolibri-watch (the kolibri UI).\nWhy these?\nPrometheus for pulling metrics from the kolibri service Grafana for displaying the dashboard representing the service state kolibri-fleet-zio for the actual service taking care of computations and state keeping, which needs to run on every node that is intended to take part in the distributed processing response-juggler: simulating responses by a search system. Used to test request and parsing tasks defined in kolibri-fleet-zio kolibri-watch: providing the UI to monitor available nodes, their resource consumption / utilization and controls to load job templates, create new ones, store job definitions and starting / stopping processing of the jobs We will first look at the configurations that need setting before being able to start up the service, focussing on kolibri-fleet-zio:\nkolibri-zio-1: image: awagen/kolibri-fleet-zio:0.2.0 cpu_count: 12 mem_limit: 6144m mem_reservation: 4096m ports: - \u0026#34;8001:8001\u0026#34; user: \u0026#34;1000:1000\u0026#34; environment: JVM_OPTS: \u0026gt; -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xms4096m -Xmx4096m PROFILE: prod NODE_HASH: \u0026#34;abc1\u0026#34; HTTP_SERVER_PORT: 8001 RUNNING_TASK_PER_JOB_MAX_COUNT: 20 RUNNING_TASK_PER_JOB_DEFAULT_COUNT: 3 MAX_NR_BATCH_RETRIES: 2 PERSISTENCE_MODE: \u0026#39;CLASS\u0026#39; PERSISTENCE_MODULE_CLASS: \u0026#39;de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule\u0026#39; AWS_PROFILE: \u0026#39;developer\u0026#39; AWS_S3_BUCKET: \u0026#39;kolibri-dev\u0026#39; AWS_S3_PATH: \u0026#39;kolibri_fleet_zio_test\u0026#39; AWS_S3_REGION: \u0026#39;EU_CENTRAL_1\u0026#39; # the file path in the job definitions are to be given relative to the path (or bucket path) defined # for the respective configuration of persistence LOCAL_STORAGE_WRITE_BASE_PATH: \u0026#39;/app/test-files\u0026#39; LOCAL_STORAGE_READ_BASE_PATH: \u0026#39;/app/test-files\u0026#39; # JOB_TEMPLATES_PATH must be relative to the base path or bucket path, depending on the persistence selected JOB_TEMPLATES_PATH: \u0026#39;templates/jobs\u0026#39; OUTPUT_RESULTS_PATH: \u0026#39;test-results\u0026#39; JUDGEMENT_FILE_SOURCE_TYPE: \u0026#39;CSV\u0026#39; # if judgement file format set to \u0026#39;JSON_LINES\u0026#39;, need to set \u0026#39;DOUBLE\u0026#39; in case judgements are numeric in the json, # if the numeric value is represented as string, use \u0026#39;STRING\u0026#39;. This purely refers to how the json value is # interpreted, later this will be cast to double either way JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST: \u0026#39;STRING\u0026#39; ALLOWED_TIME_PER_ELEMENT_IN_MILLIS: 4000 ALLOWED_TIME_PER_BATCH_IN_SECONDS: 3600 ALLOWED_TIME_PER_JOB_IN_SECONDS: 36000 MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES: 10 MAX_PARALLEL_ITEMS_PER_BATCH: 16 CONNECTION_POOL_SIZE_MIN: 100 CONNECTION_POOL_SIZE_MAX: 100 CONNECTION_TTL_IN_SECONDS: 1200 MAX_NR_JOBS_PROCESSING: 5 MAX_NR_JOBS_CLAIMED: 5 NETTY_HTTP_CLIENT_THREADS_MAX: 4 BLOCKING_POOL_THREADS: 4 NON_BLOCKING_POOL_THREADS: 4 volumes: - ./tmp_data:/app/test-files - ${HOME}/.aws/credentials:/home/kolibri/.aws/credentials:ro Configuration Options in Detail General Setup Settings PROFILE The config file suffix for the config to be loaded on startup. Will try to find application-[PROFILE].conf in the resource folder. NODE_HASH The hash that identifies this specific node. If not set, will randomly set a hash. Note: nodes are identified by the node_hash, so it should be a unique identifier. HTTP_SERVER_PORT Port to reach the kolibri-fleet-zio API under. ALLOWED_TIME_PER_ELEMENT_IN_MILLIS Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. ALLOWED_TIME_PER_BATCH_IN_SECONDS Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. ALLOWED_TIME_PER_JOB_IN_SECONDS Just a takeover from the initial job definitions for search evaluation which can still be used as a format. Yet right now this attribute does not have any effect, thus will be removed. MAX_RESOURCE_DIRECTIVES_LOAD_TIME_IN_MINUTES Defines how much time loading of a global node-resource (such as judgement lists, parameters and the like) is allowed to take. MAX_PARALLEL_ITEMS_PER_BATCH Defines how many items are processed in parallel per batch at any given time. MAX_NR_JOBS_PROCESSING Defines the maximal number of batches that are allowed in progress per node at any given time. MAX_NR_JOBS_CLAIMED Defines the maximal number of batches that can be claimed for execution at any given time. Storage configuration PERSISTENCE_MODE The persistence mode used. Can be: AWS (s3), GCP (gcs), LOCAL (local file system), RESOURCE (local resources), CLASS (if selected, need to define PERSISTENCE_MODULE_CLASS property, specifying fully qualified name to used persistence module class. PERSISTENCE_MODULE_CLASS If PERSISTENCE_MODE is set to CLASS, need to set here the fully qualified name to used persistence module class, such as de.awagen.kolibri.fleet.zio.config.di.modules.persistence.LocalPersistenceModule (which happens to refer to the same persistence module as just specifying PERSISTENCE_MODE as LOCAL). AWS_PROFILE If PERSISTENCE_MODE is AWS (or CLASS and the AWS module is referenced above), specify here the profile to use. AWS_S3_BUCKET If AWS storage is used, define here the bucket to store tha state / result data in. AWS_S3_PATH If AWS storage is used, define here the path within the above defined bucket to use as base path. AWS_S3_REGION If AWS storage is used, define the region here. GCP_GS_BUCKET If GCP storage is used, define here the bucket to store tha state / result data in. GCP_GS_PATH If GCP storage is used, define here the path within the above defined bucket to use as base path. GCP_GS_PROJECT_ID If GCP storage is used, define here the project id under which you created the bucket. LOCAL_STORAGE_WRITE_BASE_PATH If LOCAL storage is used, define the base path here under which to store the data. LOCAL_STORAGE_READ_BASE_PATH If LOCAL storage is used, define the base path here from which data is read (should usually be the same as the write base path). JOB_TEMPLATES_PATH Relative subpath (relative to the defined base paths) under which job templates are found / stored. OUTPUT_RESULTS_PATH Relative subpath (relative to the defined base paths) under which results are persisted. Judgement file format configuration JUDGEMENT_FILE_SOURCE_TYPE Type of the utilized judgement file. Possible values: CSV (per line: query, product and judgement score each separated by the configured delimiter (see below)) or JSON_LINES (one line per query in format {\u0026quot;query\u0026quot;: \u0026quot;q2\u0026quot;, \u0026quot;products\u0026quot;: [{\u0026quot;productId\u0026quot;: \u0026quot;aa\u0026quot;, \u0026quot;score\u0026quot;: 0.30}, {\u0026quot;productId\u0026quot;: \u0026quot;bb\u0026quot;, \u0026quot;score\u0026quot;: 0.11}]}) JUDGEMENT_FILE_COLUMN_DELIMITER Delimiter used if the JUDGEMENT_FILE_SOURCE_TYPE is set to CSV. Default value: \\u0000 JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST If JUDGEMENT_FILE_SOURCE_TYPE set to JSON, defines how to parse the score attribute from above format. Options: STRING (if the number is wrapped in string delimiters; this was initially only a workaround) or DOUBLE (if the score is a number in the used json). Http Connection / Connection Pool Settings CONNECTION_POOL_TYPE Specifies whether to use a fixed (FIXED) or a dynamic (DYNAMIC) connection pool. CONNECTION_POOL_SIZE_MIN If pool type is dynamic, this gives the minimum number of connections. If it is fixed, gives the fixed number of connections. CONNECTION_POOL_SIZE_MAX If pool type is dynamic, gives the maximum number of connections. If pool type is fixed, this setting is not used. CONNECTION_TTL_IN_SECONDS If pool type is dynamic, gives the TTL of a connection. If pool type is fixed, this setting is not used. CONNECTION_TIMEOUT_IN_SECONDS In either pool type, this specifies the connection timeout. Thread Pool Settings NETTY_HTTP_CLIENT_THREADS_MAX Specifies the maximal number of netty http client threads. BLOCKING_POOL_THREADS Defines number of threads assigned to the thread pool used for blocking computations. NON_BLOCKING_POOL_THREADS Defines number of threads assigned to the thread pool used for non-blocking computations. Volume Mounts Volumes Some mounts needed to access data within docker container ./tmp_data:/app/test_files mounting project root tmp_data folder to /app/test_files folder within container [absolute-path-containing-your-aws-config-folder]/.aws/credentials:/home/kolibri/.aws/credentials:ro read-only mount of folder on local machine containing aws credentials into standard location in container where its picked up automatically by aws lib Run It That section is gonna be short: docker-compose up within the project root.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/2-generators/","title":"Generators","tags":[],"description":"","content":"Generators Definition of the data is done via instances of type IndexedGenerator. Possible values for those can be found in the package de.awagen.kolibri.datatypes.collections.generators (GitHub Link). These can hold single collections of values or combinations of multiple collections. By using the right combination of those, all kinds of permutations of values can be composed.\nSo lets say you had multiple stores (lets say identified by storeIds), and they are classified into certain types, and you wanted to request certain queries pro store type. In this case you can create a generator of queries for each store-type, and per store-type a generator of storeIds. You could create a PermutatingIndexedGenerator (see below) for each query-per-type-generator and store-per-type generator, and combine all generators in a OneAfterAnotherIndexedGenerator. This now combines all stores with the right queries. The resulting generator of (storeId, query) pairs could now again be combined with any other generator to provide the needed combinations.\nAn IndexedGenerator has the following signature:\ntrait IndexedGenerator[+T] extends KolibriSerializable { /** partitioning of the overall data, where each partition is a generator itself, specific for the distinct generator types **/ def partitions: IndexedGenerator[IndexedGenerator[T]] /** iterator over all elements generated by the generator **/ def iterator: Iterator[T] /** number of elements generated by the generator **/ def size: Int /** partial generator between start (inclusive) and end index (exclusive) **/ def getPart(startIndex: Int, endIndex: Int): IndexedGenerator[T] /** get element specified by given index if index within valid range, otherwise None **/ def get(index: Int): Option[T] /** new generator generating elements by retrieving elements of this generator and applying the map function on it **/ def mapGen[B](f: SerializableFunction1[T, B]): IndexedGenerator[B] } Generator Types ByFunctionNrLimitedIndexedGenerator Takes the number of elements and a function mapping the index to a value. Helper-function ByFunctionNrLimitedIndexedGenerator.createFromSeq takes any Seq and provides a generator generating the elements of the Seq.\nBatchBySizeIndexedGenerator Takes a generator of Seq[T] and a batch size and generates partial generators which in turn generate at most the number of Seq[T] elements given by the batch size (the last element generated might contain less).\nBatchByGeneratorIndexedGenerator Takes Seq of IndexedGenerators and a batchByIndex, and generates generators of Seq[T], where each partial generator corresponds to a single value of the generator provided at index given by batchByIndex. The values for the remaining generators are generated by all possible combinations of the values of each generator.\nMergingIndexedGenerator Takes two generators, generates all possible combinations of values of those generators and applies a mapping function to the pair of values to yield the generated values.\nNthIsNthForEachIndexedGenerator Generator that yields for index n the Seq of values made of one value per generator, while for each generator its n-th element is chosen. Thus no permutations here.\nOneAfterAnotherIndexedGenerator Generator that just starts picking elements from the next generator when the requested element exceeds its own elements, e.g just sequentially provides the elements of the distinct generators.\nPartitionByGroupIndexedGenerator Generator that takes a sequence of generators and acts like a normal OneAfterAnotherIndexedGenerator, e.g will generate the elements of each contained generator sequentially, thus the number of overall elements is the sum of the elements of the single generators. What differs here is the partitions function, which will keep the groups. This partitioning by this generator and still keep logical groups within it, e.g where each generator passed reflects such as logical grouping.\nPermutatingIndexedGenerator Takes a number of generators of same type and generates all permutations of all the values within the distinct generators, where the values generated by the distinct generators will have the same Seq-index as its generator.\nExample Use-Cases TODO: give some basic example use cases showing the composition.\nHelper Structures - Define Your Data The structures below extends the ModifierGeneratorProvider trait, providing to basic defs\ntrait ModifierGeneratorProvider extends KolibriSerializable { def partitions: IndexedGenerator[IndexedGenerator[Modifier[RequestTemplateBuilder]]] def modifiers: Seq[IndexedGenerator[Modifier[RequestTemplateBuilder]]] } RequestPermutation The signature is as follows:\ncase class RequestPermutation(params: OrderedMultiValues, headers: OrderedMultiValues, bodies: Seq[String], bodyContentType: ContentType = ContentTypes.`application/json`) extends ModifierGeneratorProvider Mappings: Simplifying composition of data that belongs together MappingModifier The definition of MappingModifier can be found in de.awagen.kolibri.base.processing.modifiers.RequestPermutations. It simplifies the logical grouping of data by keys and providing the appropriate generators, combining the data within the group boundaries.\nThe signature of the MappingModifier is the following:\ncase class MappingModifier(keyGen: IndexedGenerator[String], paramsMapper: ParamsMapper, headersMapper: HeadersMapper, bodyMapper: BodyMapper) extends ModifierGeneratorProvider It simply is a combination of a key generator and distinct mappers for parameters, headers and bodies. Per generated key, each of the mappers is checked for existence of a matching generator of type Modifier[RequestTemplateBuilder]. All of those generators for the specific key are then combined in an PermutatingIndexedGenerator. Thus this leads to a Seq of generators, where each generator reflects all permutations (including all varied parameters, headers, bodies) for a single key.\nThe generator provided by the partitions supplier just provides each of those per-key generators in sequence. The Seq of generators provided by modifiers supplier contains a single element, that is a single OneAfterAnotherIndexedGenerator containing the per-key generators in sequence, that is it will not mix the generators for the single keys, but fully generate the values for a single key before going to the next. In this manner it generates all values for all keys.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-watch/1-ui/1-overview/","title":"Overview","tags":[],"description":"","content":"Overview The below gives an overview of the current screens provided by Kolibri Watch. Additional screens for analysis of the results of the executions are planned to follow up soon.\nCurrent Main Screens\nStatus overview of cluster Creating job definitions from templates and starting jobs Finished job history "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-watch/1-ui/","title":"UI - Kolibri Watch","tags":[],"description":"","content":"Chapter 1 UI - Kolibri-Watch In the following provides an overview of the user interface for Kolibri, which goes by the name of Kolibri Watch. It is written in javascript, utilizing the vue framework. As of now, it provides an overview of the following:\navailable nodes and consumed resouces (CPU, memory) running jobs, their progress, and option to kill any job running batches per job node, their progress history of finished jobs (currently only top N held in memory of Kolibri Backend App) overview of job types for which job execution definition templates are available, editing and saving a new template and submitting it for execution "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/2-mechanisms/1-overview/","title":"Basics","tags":[],"description":"","content":"Basics First, all batches are represented by an ActorRunnable object, that on execution provides a tuple of a KillSwitch and Future[Done] which completes when the execution completes. The KillSwitch allows killing the execution if indicated. The criteria whether an execution is to be stopped is represented by an ExecutionExpectation instance, which can contain multiple failedWhenMetExpectations, that is e.g if a certain rate or number of failed data sample processings or a timeout is exceeded.\nThe RunnableGraph is provided by an ActorRunnable instance, which is the representation of each batch execution. An ActorRunnable has the following signature:\ncase class ActorRunnable[U, V, V1, Y \u0026lt;: WithCount](jobId: String, batchNr: Int, supplier: IndexedGenerator[U], transformer: Flow[U, ProcessingMessage[V], NotUsed], processingActorProps: Option[Props], aggregatorConfig: AggregatorConfig[V1, Y], expectationGenerator: Int =\u0026gt; ExecutionExpectation, sinkType: job.ActorRunnableSinkType.Value, waitTimePerElement: FiniteDuration, maxExecutionDuration: FiniteDuration, sendResultsBack: Boolean) extends KolibriSerializable with WithBatchNr The arguments are as follows:\nName:Type What for? jobId: String job identifier of which the ActorRunnable presents a batch batchNr: Int number of the batch of the job identified by jobId supplier: IndexedGenerator[U] generator providing the single elements to process, each of which is of type U (covariant) transformer: Flow[U, ProcessingMessage[V], NotUsed] a processing flow of initial element of type U (covariant) to ProcessingMessage[V] holding the data of type V (covariant). ProcessingMessage is essentially a data container allowing specifying a weight for the data sample and typed application of tags (e.g for grouping and group-wise aggregations) processingActorProps: Option[Props] if specified sends the output of the transformer to an actor created from the provided Props via ask, expecting back element of type V1 (with specified ask timeout and counting processing of single data point as failed if the type doesnt match V1). If not specified the ProcessingMessage[V] as input from transformer is not modified, thus type V = V1 aggregatorConfig: AggregatorConfig[V1, Y] The AggregatorConfig provides an AggregationSupplier and distinct FilteringMapper instances, specifying which single elements (type V1) are aggregated and how theyre modified (if so) before aggregating, another one to decide this on the partial aggregations (type Y) and another one providing this as modifier / decider how / if to send a particular partial aggregation to the JobManager (e.g for an overall job aggregation. Note that such an aggregation doesnt need to be composed like this, usually more effective will be writing directly from the nodes where partial results are composed and later execute another aggregation on the partial results instead of sending all partial results to JobManager after being serialized) expectationGenerator: Int =\u0026gt; ExecutionExpectation Given the number of single data points in the batch, provide an ExecutionExpectation that determines success/fail crtieria for the respective batch sinkType: job.ActorRunnableSinkType.Value Either \u0026lsquo;IGNORE_SINK\u0026rsquo; or \u0026lsquo;REPORT_TO_ACTOR_SINK\u0026rsquo;. If the former, results are ignored. If the latter, the ActorRef is picked from JobActorConfig by key ActorType.ACTOR_SINK. Usually this should refer to an AggregatingActor. This is the actor the results will be sent to. This happens either per element or in grouping fashion (if useResultElementGrouping=true) waitTimePerElement: FiniteDuration Determines the timeout, which only applies if processingActorProps is not empty and only refers to the processing time of the ASK to that specific actor maxExecutionDuration: FiniteDuration Not used in the runnable itself, but utilized in creation of RunnableExecutionActor that effectively runs the RunnableGraph that is provided by the ActorRunnable sendResultsBack: Boolean Not used in the runnable itself, but utilized in the creation of the AggregatingActor created by the RunnableExecutionActor. If set to false, no results will be provided back to the RunnableExecutionActor and via this actor to the jobSender (JobManagerActor) Logic in the ActorRunnable: The RunnableGraph is executed within a RunnableExecutionActor, which starts the execution and creates the AggregatingActor to handle the single results.\nThe job definition is received by the SupervisorActor, which creates an JobManagerActor per job and sends the job definition to it. Note that both SupervisorActor and JobManagerActor run on the same node, which is the node marked as httpserver. The JobManagerActor takes care of batches distribution, sending new batches to process when previous ones finish. The distribution is done via a Distributor and batches are sent via router across all nodes that have the role compute (note that each node can have multiple roles, yet there should only be one of with role httpserver).\nThe basic schema is as follows:\nIn Progress: More coming shortly\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-datatypes/1-types/1-categories/","title":"DataType Categories","tags":[],"description":"","content":"Lets have a look at the distinct categories of data structures provided by the kolibri-datatypes project (might not be fully exhaustive).\nIndexed Generators Indexed Generators allow to generate elements on demand by index. It provides size without having to iterate over the elements, provides Iterator of its contains elements, methods to retrieve generators of subparts of the original generator and mapping that transforms each generated element by the specified mapping function.\nThe distinct types (subject to change) are the following:\nType Description BatchByGeneratorIndexedGenerator Providing Seq of IndexedGenerators[T] and an index of the generator to batch by, provide generator of generators of Seq of the elements of the generators that reflect the permutations of the generators not batched by BatchBySizeIndexedGenerator Providing a generator of Seq[T], provide generator of generators each of maximally the max size ByFunctionNrLimitedIndexedGenerator passing the nr of elements and the generator function of index i to type T, provides the respective of type T MergingIndexedGenerator Merge two generators applying a mergeFunction on the distinct types to retrieve the respective element. Behavior is such that the combinations of generator1 and generator2 are permutated and on calculation of elements from both generators those are mapped to needed type via the mergeFunc NthIsNthForEachIndexedGenerator IndexedGenerator that yields for index n the Seq of values made of one value per generator, while for each generator its n-th element is chosen. Thus no permutations here. OneAfterAnotherIndexedGenerator Generator that just starts picking elements from the next generator when the requested element exceeds its own elements, e.g just sequentially provides the elements of the distinct generators PartitionByGroupIndexedGenerator Generator that takes a sequence of generators and acts like a normal OneAfterAnotherIndexedGenerator, e.g will generate the elements of each contained generator sequentially, thus the number of overall elements is the sum of the elements of the single generators. What differs here is the partitions function, which will keep the groups. This partitioning by this generator and still keep logical groups within it, e.g where each generator passed reflects such as logical grouping PermutatingIndexedGenerator Takes a number of generators of same type and returns generator that generates all permutations of all the values within the distinct generators, keeping the position in the resulting Seq. BatchIterable Passing an iterable and some maximal element size, iterate through batches iterators of at most the passed maximal size.\nType Description BaseBatchIterable Base implementation on each next call requesting batchSize next elements from the iterator corresponding to initially passed iterable. GeneratingBatchIterable Implementatin that on each next call only provides the IndexedGenerator of the next batch, meaning this iterable will itself create elements only on demand when elements are requested, since IndexedGenerator assumes a mechanism to calculate the i-th element instead of holding all elements in memory CombinedIterator Passing two Iterables, provides for each element of iterable1 an Iterator that iterates over all elements of iterable2 and provides the value resulting from applying the mergeFunc to the current elements of iterable1 and iterable2.\nTyped Maps Type Description TypeTaggedMap Strongly typed map utilizing TypeTags to check element type and ClassTyped[T] keys that also provide the respective type casting. Can not add values of wrong type for key, getting around type erasure by TypeTags WeaklyTypedMap Map reducing the strict type assumptions. Allows only adding correct type, but only for top level type, thus suffers from type erasure Aggregation Type Description MetricAggregation MetricAggregation that keeps track of full MetricDocuments for keys of defined type. Each key stands for a separate aggregation, which can be used for selectively aggregating subsets of results AggregateValue Keeps track of current value and count of samples the current value is based on Aggregators BaseAggregator Takes aggregation function of new element, current aggregation value yielding new aggregation value ((U, V) =\u0026gt; V), a start value supplier, and a merge function of two aggregation values BasePerClassAggregator Similar to BaseAggregator, but keeps aggregation state per Tag TagKeyRunningDoubleAvgPerClassAggregator Keeps track of averages per Tag TagKeyRunningDoubleAvgAggregator Keeps track of overall average TagKeyMetricDocumentPerClassAggregator Per class aggregates MetricRow elements into MetricDocument TagKeyMetricAggregationPerClassAggregator Per class aggregates MetricRow elements into MetricAggregation BaseAnyAggregator Wrapper for typed aggregators to accept any message and aggregate only those matching the type Multiple Values Type Description OrderedMultiValues Container for multiple OrderedValues[Any]. Provides methods to find the n-th permutation and nr of index per value for a given overall element index Fail Reasons Type Description ComputeFailReason Representing a fail type for a computation with a description Metric Stores Type Description MetricRow Single metric row, where each row is identified by set of parameters and metric values that hold for the parameters MetricRecord Storage of MetricValue for given key MetricDocument MetricDocument representing a map of parameter set to MetricRow. Implementation uses a mutable map; not threads-safe, thus access with single thread at a time. Tagging Type Description TaggedWithType Trait mapping TagTypes to a Set of Tags Tags ParameterMultiValueTag Tag defined by Map[String, Seq[String]] mapping ParameterSingleValueTag Tag defined by Map[String, String] mappings AggregationTag Tag consisting of id, a ParameterTag for the varied parameters and a ParameterTag for the fixed parameters MultiTag A tag that can hold multiple other tags StringTag Tag defined by string value NamedTag Wrapper containing name and the actual Tag Permutations Type Description PermutationUtils Helper functions to simplify permutation calculations PriorityStores Type Description BasePriorityStore providing how many elements to keep, an ordering, a function to derive key from each element, allows preserving only top elements using a PriorityQueue Values Type Description DistinctValues Simply a name for the parameter and a Seq of the distinct values RangeValues Defined by name, start value, end value and step size, generates all the values within the boundaries MetricValue Simple container keeping state with BiRunningValue that keeps track of occurring error types and the respective counts and some aggregated value type to keep track of the successful computations aggregated in the MetricValue RunningValue Keeps count of the nr of elements the current value is made of and functions to add other single values or AggregateValues BiRunningValue Running value of two distinct types, e.g can be used to record occurring errors and successful computation values in a single record, e.g in case your computation returns Either[SomeFailType, SomeComputationValue] or similar settings where two values are in some way connected. AggregateValue keeps the count of samples aggregated and the current value of the aggregation Threadsafe Async Data Loading Type Description AtomicMapPromiseStore Implementation ensuring thread safety of the value storage and also ensuring that no more than one request leads to the creation of the stored resource which could potentially be expensive (e.g in case multiple experiment batch processing actors on a single node try to request the ressource at once). E.g used to load some data expensive to load within an object to have only one data-instance per node ConcurrentUpdateMapOps Update functions for AtomicReference[Map[U, V]] "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-datatypes/","title":"Kolibri-DataTypes Documentation","tags":[],"description":"","content":"The following gives a detailled overview of usage and inner workings of the kolibri-datatypes project.\nKolibri-DataTypes on Github\nKolibri-DataTypes on Maven Central\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-datatypes/1-types/","title":"Types","tags":[],"description":"","content":"Chapter 1 Kolibri DataTypes The following gives an overview of the types provided by the kolibri-datatypes project.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/1-basics/1-runningit/","title":"Getting it started","tags":[],"description":"","content":"Lets first dive into how to get the project started on your machine. There are multiple configuration options available, which will be detailled in the following.\nStarting locally with docker-compose Find the docker compose in project root. If youre referencing an existing image, you dont need to build anything beforehand. In case you want to start a local version, make sure to package the jar, create the properly tagged docker image and reference this in the docker-compose. The steps for this would be (assuming youre in the project root folder):\n(Optional; the docker-compose file references a public image provided in public dockerhub repo) Build docker image: ./scripts/buildJar.sh docker build . -t kolibri-base:[versionTag] (e.g versionTag = 0.1.0-rc0) set the correct version in the docker-compose.yml kolibri image setting awagen/kolibri-base:[versionTag] (for publicly available images) kolibri-base:[versionTag] (in case of custom local build) (Optional) repeat above steps for the kolibri-watch project and response-juggler (only needed in case you dont wanna use the referenced public docker images) start up the kolibri cluster along with prometheus and grafana, kolibri-watch and response-juggler: docker-compose up Note that this is a complete setup where response-juggler just creates fake-responses to mock a real search system according to some sampling criteria (check the project itself for more details, https://github.com/awagen/response-juggler). If you want to execute the jobs on an actual search system, you only need to reference the right connections in your job definition (and you can comment out the response juggler from the docker-compose).\nLets go over the settings configurable by passing env vars, taking the docker-compose file as example:\nkolibri1: image: awagen/kolibri-base:0.1.0-rc0 ports: - \u0026#34;8000:8000\u0026#34; - \u0026#34;5266:5266\u0026#34; - \u0026#34;9095:9095\u0026#34; user: \u0026#34;1000:1000\u0026#34; environment: JVM_OPTS: \u0026gt; -XX:+UseG1GC -Xms1024m -Xmx4096m PROFILE: prod ROLES: httpserver KAMON_PROMETHEUS_PORT: 9095 KAMON_STATUSPAGE_PORT: 5266 CLUSTER_NODE_HOST: kolibri1 CLUSTER_NODE_PORT: 8001 HTTP_SERVER_INTERFACE: kolibri1 HTTP_SERVER_PORT: 8000 MANAGEMENT_HOST: kolibri1 MANAGEMENT_PORT: 8558 MANAGEMENT_BIND_HOSTNAME: \u0026#39;0.0.0.0\u0026#39; MANAGEMENT_BIND_PORT: 8558 CLUSTER_NODE_BIND_HOST: \u0026#39;0.0.0.0\u0026#39; CLUSTER_NODE_BIND_PORT: 8001 DISCOVERY_SERVICE_NAME: kolibri-service KOLIBRI_ACTOR_SYSTEM_NAME: KolibriAppSystem DISCOVERY_METHOD: config REQUEST_PARALLELISM: 16 USE_CONNECTION_POOL_FLOW: \u0026#39;false\u0026#39; RUNNING_TASK_BASELINE_COUNT: 2 KOLIBRI_DISPATCHER_PARALLELISM_MIN: 8 KOLIBRI_DISPATCHER_PARALLELISM_FACTOR: 8.0 KOLIBRI_DISPATCHER_PARALLELISM_MAX: 32 KOLIBRI_DISPATCHER_THROUGHPUT: 10 DEFAULT_DISPATCHER_PARALLELISM_FACTOR: 1.0 DEFAULT_DISPATCHER_PARALLELISM_MAX: 2 DEFAULT_DISPATCHER_PARALLELISM_MIN: 1 HTTP_CLIENT_CONNECTION_TIMEOUT: \u0026#39;5s\u0026#39; HTTP_CLIENT_IDLE_TIMEOUT: \u0026#39;10s\u0026#39; HTTP_CONNECTION_POOL_MAX_OPEN_REQUESTS: 1024 HTTP_CONNECTION_POOL_MAX_RETRIES: 3 HTTP_CONNECTION_POOL_MAX_CONNECTIONS: 1024 HTTP_CONNECTION_POOL_SUBSCRIPTION_TIMEOUT: \u0026#39;60 seconds\u0026#39; USE_RESULT_ELEMENT_GROUPING: \u0026#39;true\u0026#39; RESULT_ELEMENT_GROUPING_COUNT: 2000 RESULT_ELEMENT_GROUPING_INTERVAL_IN_MS: 1000 RESULT_ELEMENT_GROUPING_PARALLELISM: 1 USE_AGGREGATOR_BACKPRESSURE: \u0026#39;true\u0026#39; AGGREGATOR_RECEIVE_PARALLELISM: 32 MAX_NR_BATCH_RETRIES: 2 # persistence mode is one of [\u0026#39;AWS\u0026#39;, \u0026#39;GCP\u0026#39;, \u0026#39;LOCAL\u0026#39;, \u0026#39;CLASS\u0026#39;] PERSISTENCE_MODE: \u0026#39;CLASS\u0026#39; PERSISTENCE_MODULE_CLASS: \u0026#39;de.awagen.kolibri.base.config.di.modules.persistence.LocalPersistenceModule\u0026#39; # properties in case PERSISTENCE_MODE is \u0026#39;AWS\u0026#39; (or \u0026#39;CLASS\u0026#39; and AwsPersistenceModule is referenced in PERSISTENCE_MODULE_CLASS) AWS_PROFILE: \u0026#39;developer\u0026#39; AWS_S3_BUCKET: \u0026#39;kolibri-dev\u0026#39; AWS_S3_PATH: \u0026#39;metric_test\u0026#39; AWS_S3_REGION: \u0026#39;EU_CENTRAL_1\u0026#39; # properties in case PERSISTENCE_MODE is \u0026#39;LOCAL\u0026#39; (or \u0026#39;CLASS\u0026#39; and LocalPersistenceModule is referenced in PERSISTENCE_MODULE_CLASS) LOCAL_STORAGE_WRITE_BASE_PATH: : \u0026#39;/app/data\u0026#39; LOCAL_STORAGE_WRITE_RESULTS_SUBPATH: \u0026#39;test-results\u0026#39; LOCAL_STORAGE_READ_BASE_PATH: : \u0026#39;/app/data\u0026#39; # properties in case PERSISTENCE_MODE is \u0026#39;GCP\u0026#39; (or \u0026#39;CLASS\u0026#39; and GCPPersistenceModule is referenced in PERSISTENCE_MODULE_CLASS) GCP_GS_BUCKET: [bucket name without gs:// prefix] GCP_GS_PATH: [path from bucket root to append to all paths that are requested] GCP_GS_PROJECT_ID: [the project id for which the used service account is defined and for which the gs bucket was created] # JOB_TEMPLATES_PATH must be relative to the base path or bucket path, depending on the persistence selected JOB_TEMPLATES_PATH: \u0026#39;templates/jobs\u0026#39; JUDGEMENT_FILE_SOURCE_TYPE: \u0026#39;CSV\u0026#39; # if judgement file format set to \u0026#39;JSON_LINES\u0026#39;, need to set \u0026#39;DOUBLE\u0026#39; in case judgements are numeric in the json, # if the numeric value is represented as string, use \u0026#39;STRING\u0026#39;. This purely refers to how the json value is interpreted, # later this will be cast to double either way JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST: \u0026#39;STRING\u0026#39; # if ure requesting via https yet ne server provides no valid certificate USE_INSECURE_SSL_ENGINE: \u0026#39;true\u0026#39; # properties if discovery mode is kubernetes-api # (Optional) path to file where namespath is written in file. Available by default when setting namespace in k8s charts for the deployment / pod K8S_DISCOVERY_POD_NAMESPACE_PATH: \u0026#39;/var/run/secrets/kubernetes.io/serviceaccount/namespace\u0026#39; # namespace to which pods are assigned, if not set properly k8s access rights will not allow them finding each other. # make sure this corresponds to the namespace you deploy in. If set, K8S_DISCOVERY_POD_NAMESPACE_PATH doesnt have effect. K8S_DISCOVERY_POD_NAMESPACE: \u0026#39;kolibri\u0026#39; # label selector by which to identify the right pods K8S_DISCOVERY_POD_LABEL_SELECTOR: \u0026#39;app=%s\u0026#39; volumes: - ./test-files:/app/data - [absolute-path-containing-your-aws-config-folder]/.aws:/home/kolibri/.aws:ro - [path-to-dir-containing-key-file-on-local-machine]/:/home/kolibri/gcp:ro Configuration Options in Detail Exposed Port Settings see ports definition above HTTP_SERVER_PORT the port under which to reach the endpoints exposed by the application KAMON_PROMETHEUS_PORT port under which to scrape metrics in prometheus format KAMON_STATUSPAGE_PORT kamon statuspage port. Kamon exposes this endpoint as status overview of which metrics are collected General Setup Settings PROFILE determines which application-[profile].conf file is picked up for config settings ROLES either httpserver, compute or both httpserver,compute (comma-separated). If httpserver is within the option, node starts http server on the defined HTTP_SERVER_PORT and exposes the application endpoints. If compute is set, the node is used for computations. KAMON_PROMETHEUS_PORT port under which kamon exposes prometheus metrics to be scraped KAMON_STATUSPAGE_PORT port on which kamon status page is exposed CLUSTER_NODE_HOST cluster node host CLUSTER_NODE_PORT cluster node port HTTP_SERVER_INTERFACE http server interface for the http server (only needed if httpserver is one of the above defined ROLES) HTTP_SERVER_PORT in case one of the node roles is \u0026lsquo;httpserver\u0026rsquo;, the routes are exposed on that port MANAGEMENT_HOST management host MANAGEMENT_PORT management port MANAGEMENT_BIND_HOSTNAME management bind host name MANAGEMENT_BIND_PORT management bind port CLUSTER_NODE_BIND_HOST bind host for cluster node CLUSTER_NODE_BIND_PORT bind port for cluster node, should be same for all nodes in the cluster DISCOVERY_SERVICE_NAME the service name used for node discovery. Must be same for all nodes of the cluster KOLIBRI_ACTOR_SYSTEM_NAME the name of the ActorSystem to use for the Kolibri application DISCOVERY_METHOD cluster node discovery method to use. Can be \u0026lsquo;aws-api-ec2-tag-based\u0026rsquo; (based on ec2 instance tags in AWS), \u0026lsquo;config\u0026rsquo; (defining the endpoints per config), \u0026lsquo;dns\u0026rsquo; or \u0026lsquo;kubernetes-api\u0026rsquo; (see examples for \u0026lsquo;config\u0026rsquo; in the docker-compose.yml and \u0026lsquo;kubernetes-api\u0026rsquo; in the example helm setup). Refer to the akka discovery documentation for details on the differend modes REQUEST_PARALLELISM parallelism with which http requests are executed USE_CONNECTION_POOL_FLOW \u0026rsquo;true\u0026rsquo;/\u0026lsquo;false\u0026rsquo;. If false, uses single requests API (which should use connection pool under the hood), or flow of requests through connection pool flow (supposed to be more efficient than single request API). Note that it is essential to consume responses directly when they\u0026rsquo;re available to avoid running in timeouts. Note that each \u0026lsquo;.via(someFlow)\u0026rsquo; call is another processing stage whose processing can be delayed relative to the one before, and might cause responses not to be consumed within the timeout if backpressure applied. The single request usage seems to be safer in this regard. RUNNING_TASK_BASELINE_COUNT the initial baseline count of concurrently processed batches (this number can be increased via exposed API per job) Kolibri dispatcher settings Used for the processing provided by Kolibri. Should use majority of resources KOLIBRI_DISPATCHER_PARALLELISM_MIN minimal parallelism KOLIBRI_DISPATCHER_PARALLELISM_FACTOR factor applied to number of available processors to determine number of threads KOLIBRI_DISPATCHER_PARALLELISM_MAX maximal parallelism KOLIBRI_DISPATCHER_THROUGHPUT minimal parallelism Default dispatcher settings Only used for some internals and kamon metrics handling. Should use only a fraction of resources since majority should be reserved for the kolibri dispatcher (settings above) DEFAULT_DISPATCHER_PARALLELISM_FACTOR factor applied to number of available processors to determine number of threads DEFAULT_DISPATCHER_PARALLELISM_MAX maximal parallelism DEFAULT_DISPATCHER_PARALLELISM_MIN minimal parallelism Http client/connection pool settings HTTP_CLIENT_CONNECTION_TIMEOUT http client connection timeout, in the format \u0026lsquo;5s\u0026rsquo; (or \u0026lsquo;1m\u0026rsquo; or similar) HTTP_CLIENT_IDLE_TIMEOUT http client idle timeout, in the format \u0026rsquo;10s\u0026rsquo; (or \u0026lsquo;1m\u0026rsquo; or similar) HTTP_CONNECTION_POOL_MAX_OPEN_REQUESTS max concurrently open requests in the connection pool HTTP_CONNECTION_POOL_MAX_RETRIES max retries when executing a request HTTP_CONNECTION_POOL_MAX_CONNECTIONS max nr of connections for the connection pool HTTP_CONNECTION_POOL_SUBSCRIPTION_TIMEOUT enter a FiniteDuration, e.g for example \u0026lsquo;60 seconds\u0026rsquo; to be used as connection pool subscription timeout Partial result grouping These \u0026lsquo;RESULT_ELEMENT*\u0026rsquo; settings determine how many single results are at most collected over a given timespan in an \u0026ldquo;aggregator buffer\u0026rdquo; after which the aggregated ones so far are sent to the actual aggregator. This reduces the single messages sent to aggregator quite a bit. USE_RESULT_ELEMENT_GROUPING \u0026rsquo;true\u0026rsquo;/\u0026lsquo;false\u0026rsquo;. Turns on/off the partial result buffering. If turned off, the other \u0026lsquo;RESULT_ELEMENT_GROUPING_*\u0026rsquo; parameters below dont have any effect RESULT_ELEMENT_GROUPING_COUNT max elements to group after which the buffer result is sent to the actual aggregator RESULT_ELEMENT_GROUPING_INTERVAL_IN_MS maximal interval in ms after which the buffer result, irrespective of how many elements were buffered yet, are sent to the actual aggregator RESULT_ELEMENT_GROUPING_PARALLELISM the grouping parallelism Aggregators, retries, persistence mode USE_AGGREGATOR_BACKPRESSURE \u0026rsquo;true\u0026rsquo;/\u0026lsquo;false\u0026rsquo;. If set to true, uses ACK messages from the aggregator to apply backpressure on the processing if aggregator can not aggregate fast enough AGGREGATOR_RECEIVE_PARALLELISM parallelism with which result messages are sent to the aggregator MAX_NR_BATCH_RETRIES defines the number of retries that are executed for failed batches till they succeed PERSISTENCE_MODE defines where to write to and read from. Valid values: \u0026lsquo;LOCAL\u0026rsquo;, \u0026lsquo;AWS\u0026rsquo;. If \u0026lsquo;LOCAL\u0026rsquo;, set \u0026lsquo;LOCAL_STORAGE_DIR\u0026rsquo;, if \u0026lsquo;AWS\u0026rsquo; set below \u0026lsquo;AWS_*\u0026rsquo; vars PERSISTENCE_MODULE_CLASS if PERSISTENCE_MODE is \u0026lsquo;CLASS\u0026rsquo;, full class path for the module to be loaded (needs a non-args constructor and extend PersistenceDIModule), e.g \u0026lsquo;de.awagen.kolibri.base.config.di.modules.persistence.LocalPersistenceModule\u0026rsquo; The following settings are just valid if PERSISTENCE_MODE is \u0026lsquo;AWS\u0026rsquo; AWS_PROFILE this name should match a profile for which there exists a configuration in the .aws folder volume-mounted in the docker-compose definition (e.g \u0026lsquo;developer\u0026rsquo; if such a profile exists) AWS_S3_BUCKET the bucket name (without s3:// - prefix). The AWS_PROFILE selected should have rights to read from and write to the bucket AWS_S3_PATH the \u0026ldquo;directory\u0026rdquo; path within the bucket defined by \u0026ldquo;AWS_S3_BUCKET\u0026rdquo;. E.g \u0026lsquo;metric_test\u0026rsquo; or \u0026lsquo;folder1/folder2\u0026rsquo; (yep, I know, the conception of \u0026ldquo;directories\u0026rdquo; as such does not exist in s3, but you can use it analogous) AWS_S3_REGION the AWS region to utilize. Check com.amazonaws.regions.Regions enum in the utilized AWS lib to see valid values, e.g \u0026lsquo;EU_CENTRAL_1\u0026rsquo; The following settings are just valid if PERSISTENCE_MODE is \u0026lsquo;LOCAL\u0026rsquo; LOCAL_STORAGE_WRITE_BASE_PATH should be set to any subPath that the local volume on the host machine is mounted to. Files are written relative to this path LOCAL_STORAGE_WRITE_RESULTS_SUBPATH path relative to LOCAL_STORAGE_WRITE_BASE_PATH where results are written in subFolders corresponding to the result output identifiers used in the jobs LOCAL_STORAGE_READ_BASE_PATH should be set to any subPath that the local volume on the host machine is mounted to. Files are read relative to this path The following settings are just valid if PERSISTENCE_MODE is \u0026lsquo;GCP\u0026rsquo; GCP_GS_BUCKET bucket name without gs:// prefix GCP_GS_PATH path from bucket root to append to all paths that are requested GCP_GS_PROJECT_ID the project id for which the used service account is defined and for which the gs bucket was created GOOGLE_APPLICATION_CREDENTIALS Full path within container to serviceaccount key json file (see below volume mount), e.g \u0026lsquo;/home/kolibri/gcp/[sa-key-file-name].json\u0026rsquo; The following settings are just valid if DISCOVERY_METHOD is \u0026lsquo;kubernetes-api\u0026rsquo; K8S_DISCOVERY_POD_NAMESPACE_PATH (Optional) path to file where namespath is written in file. Available by default when setting namespace in k8s charts for the deployment / pod K8S_DISCOVERY_POD_NAMESPACE namespace to which pods are assigned, if not set properly k8s access rights will not allow them finding each other. Make sure this corresponds to the namespace you deploy in. If set, K8S_DISCOVERY_POD_NAMESPACE_PATH doesnt have effect. K8S_DISCOVERY_POD_LABEL_SELECTOR label selector by which to identify the right pods The following additional general settings JOB_TEMPLATES_PATH must be relative to the base path or bucket path, depending on the persistence selected JUDGEMENT_FILE_SOURCE_TYPE format the judgement file is given in JUDGEMENT_FILE_JSON_LINES_JUDGEMENT_VALUE_TYPE_CAST gives the type to cast the judgement value to in case JUDGEMENT_FILE_SOURCE_TYPE is a json. E.g if numerical values are wrapped in string, such as in \u0026ldquo;0.55\u0026rdquo;, select STRING, in case they\u0026rsquo;re given numerical, use DOUBLE USE_INSECURE_SSL_ENGINE if ure requesting via https yet ne server provides no valid certificate Volumes Some mounts needed to access data within docker container ./test-files:/app/data mounting workspace test-files folder to /app/data folder within container [absolute-path-containing-your-aws-config-folder]/.aws:/home/kolibri/.aws:ro read-only mount of folder on local machine containing aws credentials into standard location in container where its picked up automatically by aws lib [path-to-dir-containing-key-file-on-local-machine]/:/home/kolibri/gcp:ro read-only mount of json key file for gcp service account on local machine into location in container where its picked up by setting env variable GOOGLE_APPLICATION_CREDENTIALS to it "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/1-basics/","title":"Basics","tags":[],"description":"","content":"Chapter 1 Kolibri Basics Kolibri is a clusterable processing framework based on Akka and written in Scala. It is especially focussed on use-cases related to e-commerce search such as search result optimization. The framework is a general one though, allowing distinct types of distributed, clustered processing / state handling.\nThe following gives an overview of the general mechanics and detailled descriptions about usage.\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/2-requestparameters/","title":"Request Parameters","tags":[],"description":"","content":"Request parameters can be configured in different ways and types. This page describes how single parameters are defined and composed to create permutations (e.g needed in extensive offline evaluations where wide range of parameters are permutated to use in requests to the target system).\nThere are two general types of parameters, which are STANDALONE and MAPPING. STANDALONE just stands for a single parameter (with one or more values), while MAPPING allows to specify relationships between different parameters. Every MAPPING specifies a parameter that provides key values and one or more mapped values. These mapped values can either map to the key values or to any mapped value that has been defined before the mapped value of interest (since mapped values are defined as sequence). To define what to map to, key-mapping-assignments are specified. Here the index 0 refers to the key values, while any other index j \u0026gt; 0 refers to the mapped value that can be found at the (j-1)th index of the mapped values sequence.\nSo what is this actually good for? Using these mappings, we can significantly reduce the number of combinations processed in a grid search. If we know that some parameters are specific for let\u0026rsquo;s say each query, it does not make sense to do the calculations over the whole parameter set for each query.\nEvery parameter (whether standalone or key or value in a mapping) is assigned to a value type that clarifies that the effect of the parameter is on the request composition. Note that all types with suffix _REPLACE are actually not values used as parameters themselves, but providing values by which defined placeholders in the actual value type (given by the prefix of parameter value type with suffix _REPLACE) are replaced. So lets say you define a json payload and you\u0026rsquo;d like to set a single parameter value in the payload to 100 different values, and generate results for all of these variants, you can define the json payload once (BODY type) and place some placeholder at the position where the value shall be set. Then you define a BODY_REPLACE with the name set to the placeholder value you defined in the parameter of type BODY. During generation of request permutations those placeholders will be substituted by the current value generated by the BODY_REPLACE parameter.\nParameter Value Types BODY Values of the parameter represent request body payloads. BODY_REPLACE Replace placeholder string sequence that equals this parameter name with the values generated (per permutation only the currently generated value is used for replacement). HEADER Values of the parameter represent header values. HEADER_REPLACE Same as BODY_REPLACE, but acting on HEADER parameters. URL_PARAMETER Values of the parameter represent an url-parameter. URL_PARAMETER_REPLACE Same as BODY_REPLACE, but acting on URL_PARAMETER parameters. In the following we will give an overview of the configuration options for STANDALONE and MAPPING parameter types.\nSTANDALONE The value definitions are analogue to the descriptions of FROM_ORDERED_VALUES_TYPE, PARAMETER_VALUES_TYPE and VALUES_FROM_NODE_STORAGE from the resource-directives section in this documentation.\nMAPPING Mappings always consist of a parameter that serves to provide the key values and one or more mapped values, which are mappings of key values to the actual parameter values. The actual mapping value types are described in detail in the section on resource directives, thus please refer to that part of the documentation. Those types are: JSON_VALUES_MAPPING_TYPE, JSON_VALUES_FILES_MAPPING_TYPE, JSON_SINGLE_MAPPINGS_TYPE, JSON_ARRAY_MAPPINGS_TYPE, FILE_PREFIX_TO_FILE_LINES_TYPE, CSV_MAPPING_TYPE, VALUES_FROM_NODE_STORAGE.\nNote that the key-mapping assignment is given by a list of 2-element arrays where the first element specifies the index of the values used as keys and the second element gives the index of the mapped parameter. Note that the key values relate to index 0, while the mapped parameters start at index 1. The first element always has to be smaller than the second, yet depending on the needs each mapped parameter can either be mapped to the values of the key value parameter or any mapped parameter (that has a smaller index than the mapped value itself).\nExample configuration: "},{"uri":"http://awagen.github.io/kolibri/2-config-details/4-file-formats/2-resultformat/","title":"Result Files","tags":[],"description":"","content":"COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/2-job-definitions/","title":"Job Configurations","tags":[],"description":"","content":"Currently, there are four types of job definitions:\nJob Definition Types JUST_WAIT Dummy job. Only use this if you want to test the claiming of single batches by the nodes. The batches here only implement a wait-interval. SEARCH_EVALUATION Legacy search evaluation definition with all configuration options. Main format of definition till v0.1.5, thus left in for compatibility reasons. Recommended and most flexible way ifsREQUESTING_TASK_SEQUENCE. QUERY_BASED_SEARCH_EVALUATION Legacy search evaluation definition with reduced configuration options and some predefined settings. Main shortened format of definition till v0.1.5, thus left in for compatibility reasons. Recommended and most flexible way ifsREQUESTING_TASK_SEQUENCE. REQUESTING_TASK_SEQUENCE Build your processing pipeline by defining a sequence of tasks. Provides most flexibility. Recommended way of defining jobs. Actual UI selection screen: Note that for the attribute excludeParamColumns you need to fill in something, which can also just be an empty list. In case you want to keep an empty list, add an entry and delete it again in the UI such that an empty array is generated in the json representation to the right. This is a temporary workaround due to the fact that the current config sees this field as mandatory.\nLet\u0026rsquo;s walk over the job definitions defining actual computation. In the below those attributes that are purely legacy fields and not causing any effect (you will still need to set them in some cases) are marked with (Not used). Fields with separate section here in the documentation for more details are marked with (*).\nSEARCH_EVALUATION This definition is exists to provide backward-compatibility to job definitions that were composed up to kolibri v0.1.5, when the task-sequence configuration has not yet been provided. For new configurations we recommend using the REQUESTING_TASK_SEQUENCE type, as it provides the most flexibility. Definitions submitted for the SEARCH_EVALUATION type are converted internally to a task sequence.\nThe single fields are as follows:\nSEARCH_EVALUATION fields jobName Simply the name of the job. Prohibited character: underscore. requestTasks (Not used): Integer, defining how many batches are processed in parallel. fixedParams Definition of parameter settings that are applied on every request. You can assign multiple values to a single parameter name, they will be all set in the request. contextPath The url context path used for requests. httpMethod The http method to use, choose between GET, PUT, POST. connections Define or or more target urls by defining the host (without protocol prefix, e.g not \u0026lsquo;http://search-service\u0026rsquo; but \u0026lsquo;search-service\u0026rsquo;), the port, whether to use http or https). If multiple connections are defined, load will be distributed among them. (*) resourceDirectives Resource directives are configurations of resources that shall be loaded centrally on each node, and are only removed after the node does not have any batch running anymore that relates to a job for which the resources were defined. Larger resources should be loaded this way, such as judgement lists or extensive parameter mappings. Right now the options are available: JUDGEMENT_PROVIDER, MAP_STRING_TO_DOUBLE_VALUE, MAP_STRING_TO_STRING_VALUES, STRING_VALUES (*) requestParameters Defines the sequence of parameters (STANDALONE or MAPPING) that are permutated to generate the whole range of requests to evaluate. While parameters of type STANDALONE are just iterated over, MAPPINGS provide the option to limit the permutation space by defining a parameter generating key values and one or more mappings that are either mapped to the key value or any other mapped parameter configured before. More details in the separate documentation section. batchByIndex Specifies the 0-based index of the above requestParameters list to define the parameter by which to batch the job (e.g the query-parameter is a natural parameter to batch by usually). (*) parsingConfig The parsing configuration specifies which fields to parse from each response (response is assumed to be of json format). excludeParamColumns Sometimes it makes sense to exclude certain parameters from the parameters that are used to group results for each row in the calculation result. One use case is to exclude the query parameter if the batching is done by each query to avoid redundant entries. Another case is if we make use of the *_REPLACE parameter types, which are used to replace substrings in other parameters. (*) calculations Sequence of different types of calculations to perform per request. Options include: IR_METRICS, IDENTITY, FIRST_TRUE, FIRST_FALSE, TRUE_COUNT, FALSE_COUNT, BINARY_PRECISION_TRUE_AS_YES, BINARY_PRECISION_FALSE_AS_YES, STRING_SEQUENCE_VALUE_OCCURRENCE_HISTOGRAM. For a detailed description see the separate section in this documentation. (*) metricNameToAggregationTypeMapping Assignment of metric name to the applicable aggregation type. Common IR metrics should be assigned to DOUBLE_AVG, and histogram would be of type NESTED_MAP_UNWEIGHTED_SUM_VALUE (just summing up occurrences without using sample weights). If nothing is defined, default value assigned will be DOUBLE_AVG, so do not forget to assign the right types in case this does not apply. You will usually not need any types other than the mentioned ones, but the full range of options are: DOUBLE_AVG, SEQUENCE_KEEP_FIRST, MAP_UNWEIGHTED_SUM_VALUE, MAP_WEIGHTED_SUM_VALUE, NESTED_MAP_UNWEIGHTED_SUM_VALUE, NESTED_MAP_WEIGHTED_SUM_VALUE. (*) taggingConfiguration Set distinct tagging options. Per tagger you need to select whether to make it extending, that is whether a separate tag shall be added or the tag shall extend existing ones (see extend) attribute. Allows definition for distinct stages of processing as initTagger (tagging on request template), processedTagger (acting on the response) and resultTagger (acting on the generated response). Yet right now only two options are implemented, one by request parameter and the other by result length. (*) wrapUpFunction Allows definition of processing steps to execute after the job is finished (this is part of the job-wrap-up task that is claimed by the nodes). See the section on task definitions for more details. allowedTimePerElementInMillis (Not used): Timeout for each single element per batch. allowedTimePerBatchInSeconds (Not used): Timeout for the whole batch (e.g for the sum of time the single elements in a batch take up). allowedTimeForJobInSeconds (Not used): Timeout for the whole job (that is processing of all batches contained in the job). expectResultsFromBatchCalculations (Not used): Was used in the akka-based variant to decide whether the central supervisor needs the result (e.g for an overall aggregation) or an empty confirmation is enough (such as when results of batches are stored in central storage anyways). A fully configured form can look like this:\nQUERY_BASED_SEARCH_EVALUATION Note that this configuration option contains some pre-defined configurations. This includes a range of IR metrics (more details below). Due to this, there is a predefined key under which the judgements list is loaded, and you will need to use the same key in the definition of otherCalculations in case you want to add more IR metrics. The key format is KOLIBRI_JUDGEMENTS-job=[jobName], where jobName is the name configured in the jobName field. Further, the setting productIdsKey needs to be set to the value productIds.\nQUERY_BASED_SEARCH_EVALUATION fields jobName Simply the name of the job. Prohibited character: underscore. connections Define or or more target urls by defining the host (without protocol prefix, e.g not \u0026lsquo;http://search-service\u0026rsquo; but \u0026lsquo;search-service\u0026rsquo;), the port, whether to use http or https). If multiple connections are defined, load will be distributed among them. fixedParams Definition of parameter settings that are applied on every request. You can assign multiple values to a single parameter name, they will be all set in the request. contextPath The url context path used for requests. queryParameter The name of the parameter defined unter requestParameters that provides the queries. httpMethod The http method to use, choose between GET, PUT, POST. (*) productIdSelector The selector to extract the productIds from the response json. See the separate documentation section on selectors. (*) otherSelectors Allows definition of further extractors to include additional fields. (*) otherCalculations Define additional calculations besides the pre-defined ones (see below). otherMetricNameToAggregationTypeMapping Define the aggregation type mappings for the additional metrics you configured in otherCalculations. judgementFilePath The file path where the judgements reside (relative path compared to configured base path). (*) requestParameters Defines the sequence of parameters (STANDALONE or MAPPING) that are permutated to generate the whole range of requests to evaluate. While parameters of type STANDALONE are just iterated over, MAPPINGS provide the option to limit the permutation space by defining a parameter generating key values and one or more mappings that are either mapped to the key value or any other mapped parameter configured before. More details in the separate documentation section. excludeParamColumns Sometimes it makes sense to exclude certain parameters from the parameters that are used to group results for each row in the calculation result. One use case is to exclude the query parameter if the batching is done by each query to avoid redundant entries. Another case is if we make use of the *_REPLACE parameter types, which are used to replace substrings in other parameters. Pre-Defined Settings Some of the settings are already predefined in this configuration type. Let\u0026rsquo;s see which.\nproductIds are stored under the key productIds. the judgement data is loaded in a resource with the key KOLIBRI_JUDGEMENTS-job=[jobName], where [jobName] corresponds to the job name configured. batching happens on index 0, thus in case you want to batch on your query parameter, make sure you configure it at first position in the parameter list. a bunch of IR metrics are pre-defined, those are: NDCG@2 NDCG@4 NDCG@8 NDCG@12 NDCG@24 PRECISION@k=2,t=0.2 (t=0.2 stands for threshold to decide between relevant / non-relevant is 0.2) PRECISION@k=4,t=0.2 RECALL@k=2,t=0.2 RECALL@k=4,t=0.2 judgement handling strategy is set to EXIST_RESULTS_AND_JUDGEMENTS_MISSING_AS_ZEROS validation done based on existence of results (if no results exist (0 result hit) the result is marked as failed and recorded as such) in case a product does not have an existing judgement for a query, the judgement value of 0.0 will be used. Note that in the SEARCH_EVALUATION definition format you can select an alternative (such as average of non-missing judgements, to see unknown products for the query as of average quality) tagging is only done based on query parameter the configured wrapUpFunction (executed after all batches completed) aggregates all generated partial results into an overall aggregation, where each sample gets equal weight (1.0) A completed form could look like this:\nREQUESTING_TASK_SEQUENCE This is the recommended and most flexible way of composing computations.\nThe fields are:\nREQUESTING_TASK_SEQUENCE fields jobName Simply the name of the job. Prohibited character: underscore. (*) resourceDirectives Resource directives are configurations of resources that shall be loaded centrally on each node, and are only removed after the node does not have any batch running anymore that relates to a job for which the resources were defined. Larger resources should be loaded this way, such as judgement lists or extensive parameter mappings. Right now the options are available: JUDGEMENT_PROVIDER, MAP_STRING_TO_DOUBLE_VALUE, MAP_STRING_TO_STRING_VALUES, STRING_VALUES (*) requestParameters Defines the sequence of parameters (STANDALONE or MAPPING) that are permutated to generate the whole range of requests to evaluate. While parameters of type STANDALONE are just iterated over, MAPPINGS provide the option to limit the permutation space by defining a parameter generating key values and one or more mappings that are either mapped to the key value or any other mapped parameter configured before. More details in the separate documentation section. batchByIndex Specifies the 0-based index of the above requestParameters list to define the parameter by which to batch the job (e.g the query-parameter is a natural parameter to batch by usually). (*) taskSequence List of actual tasks to compute. Tasks defined at index i will have access to data generated in the tasks from index [0, i-1]. For a more detailed description of the options, see the respective section. metricRowResultKey The key of the generated result under which to find the end result of type MetricRow. For this to be available, any task in the task sequence needs to compute such an object and store it in the result map under the configured key. See METRIC CALCULATION task for an example. Let\u0026rsquo;s look at a completed example that provides a sequence of REQUEST_PARSE and METRIC_CALCULATION tasks to request one or many target systems and calculate metrics from the retrieved results:\n"},{"uri":"http://awagen.github.io/kolibri/1-first-steps/","title":"1. First Steps","tags":[],"description":"","content":"The following describes the first steps to take when using Kolibri-Fleet-ZIO and important concepts for the understanding of how everything works together.\nThis covers the aspects:\nHow to configure, run it Test job processing (no need for additional external services) How to configure your own jobs and make use of tagging and batching How to aggregate partial results (with or without relative weights) "},{"uri":"http://awagen.github.io/kolibri/1-first-steps/2-using-the-ui/","title":"Usage via UI","tags":[],"description":"","content":"The simplest way to utilize Kolibri is by using the UI. This section shows how to use it and how it is connected to the processing mechanism in the background.\nStatus Page The status page displays several types of information:\nAvailable nodes, time of last update received and their resource utilization Open Jobs: either in progress or waiting to be started including controls to Start (adds a directive to the job folder that marks the job for processing), Stop (removes the directive that marks the job for processing, such that the job definition and state information will be kept but ignored for now), Delete (removes whole job data including job definition and batch states, which effectively stops all processing; does not remove already persisted partial results). Also lists the directives placed in the job folder and the batch count per status. Batch Status: listing of all batches currently on progress over all available nodes including the current processing state. History Page Listing of completed jobs including the batch count per status. Deleting any entry here only causes the job definition and status information to be removed, the generated results will not be deleted, thus this functionality can safely be used for housekeeping.\nCreate Page This is the page that allows creation of new jobs. This page has tabs for two different editing modes:\nFORM mode FORM: provides form fields per selected job type / name. If a template is available and selected, the form will be pre-filled with those details. Note that this form is not manually added per type, but the backend sends a structural definition of the expected fields in json format to the frontend, which then generates the needed fields. While adding information, you will see to the right the resulting json to iteratively grow with the information you entered. The resulting json is exactly what will either be persisted (SAVE TEMPLATE button) or added to the existing job definitions (RUN TEMPLATE) when using the respective controls. In case of saving, do not forget to enter a template filename (make sure it ends with .json and does not clash with an existing job definition template name, otherwise the action will not succeed and you will be shown a warning).\nSimple form, not pre-filled: Simple form, pre-filled: More complex form, pre-filled: RAW mode RAW: after selecting a template type and a specific template, this form allows free input per field name. By using the APPLY CHANGES button under the input box, the json to the right gets updated. Note that nothing is persisted before you use the controls as described above.\nRunning the first Job Our Hello World example here is a boring wait job. It does actually not do anything except block a thread for a while and then be done at some point. Have a look at the above example for jobDefinition selected for the field Select Job Name. Input needed:\ntype: select JUST_WAIT jobName: give your job any name (trying to run the job will only fail if there is already a job with the same name in the open jobs. Note that the job name there is actually of the form [jobName]_[timePlacedTimeStamp], and only the jobName is used for comparison here. nrBatches: the number of batches you would like to run (dont select too many, since they are actually not doing anything useful), durationInMillis: time you want each batch to wait before finishing Now click on RUN TEMPLATE. After a few moments you should see the job appear on the Status Page, find it and press Start. Congratulations, first job is running! :). Also, you should fine in your configured base folder a jobs subfolder, where you should find in the open subfolder another folder that is named according to [jobName]_[timePlacedTimeStamp]. This is the subfolder for the respective job, containing the job definition, process status information and processing directives (single empty files starting with KDIR_ that define whether / how a job shall be processed). Note that after processing is done the job subfolder gets fully moved to the done subfolder.\n"},{"uri":"http://awagen.github.io/kolibri_archive/","title":"Kolibri Documentation (Archive / Deprecated)","tags":[],"description":"","content":" Kolibri - The Execution Engine that loves E-Commerce Search\nKolibri is the german word for hummingbird. I picked it as project name to reflect the general aim to do many smaller things fast. And this describes the batch processing logic still quite well, while the overall conception grew broader. Built in Scala, based on Akka with the many functionalities it provides, the areas of possible applications are diverse. The feature-set provided by Akka makes it very suitable for a range of tasks, whether it is cluster-orchestration, flexible and efficient execution definitions with akka-streams or simply (distributed/sharded) state keeping or mixtures thereof.\nThe focus currently is on batched, clustered multi-node execution with central supervision, with the intention to iteratively provide functionalities commonly needed in (e-commerce) search in a scalable (e.g multi-node) and efficient manner, such as\nsearch result evaluations (judgement-list based) efficient indexing load-testing The first use-case provided is batched grid evaluations of search result order based on judgement lists, varying the involved parameters (url parameters, request bodies, headers) and allowing partial result writing, grouping and aggregation of selected partial results and analysis based on single results / groups, such as analysing which queries show improvement or decline for specific settings.\nA short description of the single libraries is given in the following.\nKolibri DataTypes This library contains basic datatypes to simplify common tasks in batch processing and async state keeping.\nKolibri-DataTypes on Github\nKolibri-DataTypes on Maven Central\nKolibri Base Kolibri Base provides a clusterable multi-node batch execution setup. Batch definitions are flexible and make use of Akka-Streams, allowing the definition of flexible execution flows. Results are aggregated per batch and on demand aggregated to an overall result.\nFeatures include:\nCluster forming / node discovery Definition of datasets Mechanisms to split those sets into smaller batches Distribution logic of single batches on the single nodes including state handling and collection of partial results Definition of expectations per batch, including maximal allowed runtime per batch, per batch element, fraction of failed executions to consider batch as failed Retry mechanism on batch failure Use case job definitions include:\nSearch parameter grid evaluation with flexible tagging based on request (e.g by request parameter), result (e.g size of result set, other characteristics of the search response) or by MetricRow result. Tagging allows separation into distinct aggregations based on the concept a tag represents. Kolibri-Base on Github\nKolibri-Base on Maven Central\nKolibri-Base on DockerHub\nKolibri Watch (UI) Kolibri Watch provides a UI for the Kolibri project, allowing monitoring of job execution progress, definition of the executions and submission to the Kolibri backend for execution.\nKolibri-Watch on Github\nKolibri-Watch on DockerHub\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/","title":"Kolibri-Base Documentation","tags":[],"description":"","content":"The following gives a detailled overview of usage and inner workings of the kolibri-base project.\nKolibri-Base on Github\nKolibri-Base on Maven Central\nKolibri-Base on DockerHub\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/3-tags/","title":"Tagging","tags":[],"description":"","content":"Tagging Coming shortly :)\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/1-basics/2-monitoring/","title":"Monitoring","tags":[],"description":"","content":"Metrics with Kamon You\u0026rsquo;ll find the kamon configuration file within the resources/metrics folder (kamon.conf). It contains instrumentation configuration including filters for which elements metrics shall be collected as well as the configuration for the exposed server providing the status page mentioned above.\nAn example dashboard can be found in the grafana/dashboards folder. It provides general metrics regarding system performance. The below provides a description of the distinct displays in the example dashboard, for screenshot of the dashboard see below.\nMetric Display Meaning Dead Letters Messages that could not be delivered to the actor they were sent to. This can be normal, e.g in case the is a shutdown message sent but actor already shut down or similar. If happening in unexpected cases, they might indicate a problem with the workflow. Unhandled Messages sent that were received but not handled (e.g were missing handling in the receive function of the receiving actor) tracked Processed and tracked messages (tracked as per kamon.conf filters) untracked Processed but untracked messages (untracked as per kamon.conf filters) Active Actors Nr of active actors per node Actor Errors Nr of errors per actor class Mailbox Sizes Mailbox size per actor class. Refers to the nr of messages in the mailbox queue waiting to be processed. If this number increases in an actor critical for processing this might indicate a bottleneck. Time in Mailbox Avg time a message spends in the mailbox to be processed per actor class. Long times in mailbox can indicate a processing bottleneck. Actor Processing Times Avg message processing times per actor class. High numbers can indicate extensive workflows or long processing times of single elements or a combination. Job Manager Actor Processing Times Avg processing times for Job Manager Actor. In Kolibri, each submission of new job creates a new Job Manager Actor which handles distribution of batches across the nodes. Runnable Execution Actor Processing Times Avg processing times for Runnable Execution Actors. Those actors start the RunnableGraph on the single nodes, which means executing a single batch. Aggregating Actor Processing Times Avg processing times for Aggregating Actors. For each batch execution as executed by a Runnable Execution Actor there is one Aggregating Actor to aggregate the single results to an overall per-batch result Requests/min Client requests to external systems in /min avg Client Request Time The time needed by the requested external service to answer the requests sent by Kolibri. CPU Load Avg, Min, Max CPU Load of the whole cluster Nr of GCs Number of occurring GCs Avg GC times Avg time a single GC ran GC time Overall avg time spent in GC JVM memory Overview of memory boundaries and used memory per node "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/2-mechanisms/","title":"Mechanisms","tags":[],"description":"","content":"Chapter 2 Mechanisms The following will describe the key aspects of the processing logic. While the cluster setup is based on akka-cluster with node-discovery (akka-discovery), processing logic mainly consists on processing / distribution / expectation logic added on top where batch processing is definable via akka streams, allowing flexible processing topologies. The processing definition is described in the ActorRunnable class, providing the RunnableGraph to execute on single worker nodes.\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/","title":"2. Configuring Jobs / Tasks","tags":[],"description":"","content":"The following describes the details regarding configuration of jobs in kolibri.\nThis covers the aspects:\nDifference of tasks, jobSummary and jobs The different task types The jobSummary definition The different job types "},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/","title":"Configuration Options","tags":[],"description":"","content":"This section gives a more detailed overview of some of the job configuration options.\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/3-parsingconfig/","title":"Parsing Configuration","tags":[],"description":"","content":"A parsing configuration consists of the following parts:\nselector that defines which fields to pick from a json name under which the extracted data is stored castType that defines what the value is cast to (Note: for a recursive selector that extracts a sequence of single-value fields, use the single-value cast type, that is if you use a recursive selector and each single extracted element is a string, you will use castType \u0026lsquo;STRING\u0026rsquo;, not \u0026lsquo;SEQ[STRING]\u0026rsquo;. If every element is a list of strings, ud use \u0026lsquo;SEQ[STRING]\u0026rsquo;) The selector syntax is straight-forward. Let\u0026rsquo;s use the following json as example:\n{ \u0026#34;response\u0026#34;: { \u0026#34;numFound\u0026#34;: 10, \u0026#34;docs\u0026#34;: [ { \u0026#34;product_id\u0026#34;: \u0026#34;id1\u0026#34;, \u0026#34;description\u0026#34;: \u0026#34;yummy yummy\u0026#34;, \u0026#34;title\u0026#34;: \u0026#34;yummy\u0026#34;, \u0026#34;innerJson\u0026#34;: { \u0026#34;key1\u0026#34;: \u0026#34;value1\u0026#34; } } ] } } Now we distinguish between plain and recursive selectors, while both selectors can be combined:\nplain: \\ is the selector. Can apply multiple to navigate deeper into a structure. Example: response \\ numFound (in this case castType should be set to INT). recursive: \\\\ is the selector. Is used to extract sequential values from a list of jsons. Example: response \\ docs \\\\ product_id (in this case castType should be set to STRING, although the result of applying the selector will be a list of strings). If your recursive selector picks up elements that are themselves json objects, you can pick a field by just applying another plain selector, as in response \\ docs \\\\ innerJson \\ key1 "},{"uri":"http://awagen.github.io/kolibri/2-config-details/4-file-formats/3-summaryformat/","title":"Summary Format","tags":[],"description":"","content":"COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/1-first-steps/3-useful-examples-pt1/","title":"Actually useful examples Pt1 - Composing a job definition / Aggregation","tags":[],"description":"","content":"When you recall the job definition types from the last section, you see that there is a variation of job types:\nJUST_WAIT: dummy example that does not do anything useful, the Hello World example for job definitions. Just for testing. SEARCH_EVALUATION: legacy format representing a search system evaluation (here for backward-compatibility for now). Is mapped to REQUESTING_TASK_SEQUENCE job definition. QUERY_BASED_SEARCH_EVALUATION: Simplified / reduced definition compared to SEARCH_EVALUATION due to already pre-configured fields. Legacy format representing a search system evaluation (here for backward-compatibility for now). Internally is mapped to a SEARCH_EVALUATION definition, which is then mapped to a REQUESTING_TASK_SEQUENCE. REQUESTING_TASK_SEQUENCE: allows configuration of a sequence of tasks where later tasks can reference values generated by previous tasks via keys. This is the usual type to go for, and other types are actually mapped to this type before they are processed. In the following we will describe an example configuration of a task sequence that includes\na) definition of parameter permutations b) batching based on single parameters c) requesting of a target system per parameter combination d) evaluation of metrics per result e) persisting of partial results f) aggregation of partial results to an overall summary Configuring the Job We will be using the FORM mode for configuration here. This is usually the most convenient since most guided method.\nGo to CREATE page, select jobDefinition for the field Select Job Name type: select REQUESTING_TASK_SEQUENCE You will now see relatively few fields (since we actually have not added anything yet):\nNow lets enter some details:\njobName: testJob1 resourceDirectives: resource directives define which data should be loaded globally per node. This makes sense for resources shared between different batches, such as judgement lists. We will later be able to reference node-storage resources via the herein defined identifier. Now we addume that you have a judgement-file relative to the base directory in the following path: data/test_judgements.txt. Let\u0026rsquo;s continue by specifying the permutations of parameters:\nAdd the first parameter with a manually entered value list (there are other ways, dont worry, this will be covered in the Sources section). Here we named it q. Add another one. Here we name it param1 Add a range value. Here we name it range1 Select the parameter index to form the batches by. Here we select the index 0, which refers to the first parameter defined in the above list, which is the q-Parameter. We could have picked any other parameter, and the 0-based index always refers to the order in which the parameters were defined (in case a mapping is defined, index refers to its key-values, not their assigned mapped values).\nNow we need to define a taskSequence. This means we have to compose what will be done with above permutations applied to a http request as initial processing elements. Thus we will define the following task types in the given order: a) REQUEST_PARSE: to request a target system and extract information b) METRIC_CALCULATION: to use the extracted information to calculate the desired metrics REQUEST_PARSE task definition METRIC_CALCULATION task definition Now it\u0026rsquo;s only left to configure the key under which the result of the metric calculations can be found. Since we did not specify any value, we enter the default (NOTE: there is a glitch in the screenshots as the fields should be contained in the input form. This will be corrected shortly. Also, you will need to press the + besides excludeParamsFromMetricRow to expand the selection so that the attribute appears in the resulting json format, but you do not need to enter anything if you don`t want to exclude any field from the metric results)\nNow we can store the template and select to transition it to open task. If we navigate to the STATUS page, we should now see the job if we used the RUN TEMPLATE button on the CREATE page. If you made use of the docker-compose file, you should see two instances of response-juggler service set up and running. This will be needed to actually run the job, since we reference both of these endpoints there. You can now press on the Start button. You should see a change in the Directives field shortly after showing existence of a PROCESS directive. Also you should see entries in the Batch Status section after a few moments, showing the process of the batches.\nYou will see the negotiation process also in the service logs (yes, here it is a bit pointless since we only started one node :)): Now you can observe the process status in the STATUS page. After completion you should fine the result files in json and csv format in the respective results subfolder.\nCONGRATULATIONS, your first job definition looks lovely!\nAggregation Options Defining whole jobs is not the only option we have. We can also post single tasks that are directly executed on the receiving node. This might include functions such as aggregating partial results to an overall summary or other computations.\nThe currently provided options are:\nAGGREGATE_FROM_DIR_BY_REGEX: aggregate result files from a specified folder after filtering them by matching the files in that folder against a regex and store the result file (name given by outputFilename) into the configured writeSubDir (relative to the configured base path). AGGREGATE_FILES: specify single specific files to aggregate AGGREGATE_GROUPS: specify groups either manually via UI (groupId → [query1, query2,\u0026hellip;]) mappings or via a provided group file. Further, the setting weightProvider either setting constant weight for all queries or specific weights for each query. Below we see an example of an aggregation that picks result files from the defined folder (see below), given a regex that is matched against the files in that folder.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/4-processing-messages/","title":"Processing Messages / Aggregation States","tags":[],"description":"","content":"Processing Messages And AggregationStates Single processing units are represented by instances of type ProcessingMessage. This allows enriching of values with tags, which can be used for selective result handling, such as result writing, aggregations and selective handling of values.\nCompletion of a single batch is signalled by message of type AggregationState. This can be of two types:\nAggregationStateWithoutData: provide info about completed batch without the generated data AggregationStateWithData: provide info about completed batch with the generated data trait AggregationState[+T] extends KolibriSerializable with TaggedWithType { val jobID: String val batchNr: Int val executionExpectation: ExecutionExpectation } case class AggregationStateWithoutData[+V](containedElementCount: Int, jobID: String, batchNr: Int, executionExpectation: ExecutionExpectation) extends AggregationState[V] case class AggregationStateWithData[+V](data: V, jobID: String, batchNr: Int, executionExpectation: ExecutionExpectation) extends AggregationState[V] "},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/","title":"Step-by-Step Guide","tags":[],"description":"","content":"Chapter 3 Step-By-Step Guide In the following we will describe the composition of elements going into a full job definition. This entails definitions of:\ndata set to process processing tagging aggregation persistence post-analysis Enjoy :)\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-watch/","title":"Kolibri-Watch Documentation","tags":[],"description":"","content":"The following gives an overview of the Kolibri UI by the name of Kolibri Watch.\nKolibri-Watch on Github\nKolibri-Watch on DockerHub\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/1-basics/3-executeexamplejob/","title":"Executing examples","tags":[],"description":"","content":"An example job definition for a parameter grid search evaluating search metrics against a given endpoint is provided within the scripts-folder. The definition is contained in the file testSearchEval.json, that can be send to the respective Kolibri endpoints (see start_searcheval.sh). Where the response is written is configured via properties/env variables (see respective part of the documentation). A simpler way is to start up the app along with the UI (Kolibri Watch, see respective section of this doc), and navigate to the CREATE menu, select the search evaluation type and choose a job execution definition template. From this UI you can directly edit an existing template, save it and start the execution. Note that the execution below makes a few assumptions for the processing to work.\nLets have a look at the definition and then define the meaning of the distinct sections of the used json. You\u0026rsquo;ll see something like this:\n{ \u0026#34;jobName\u0026#34;: \u0026#34;testJob\u0026#34;, \u0026#34;fixedParams\u0026#34;: { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;k2\u0026#34;: [ \u0026#34;v3\u0026#34; ] }, \u0026#34;contextPath\u0026#34;: \u0026#34;search\u0026#34;, \u0026#34;connections\u0026#34;: [ { \u0026#34;host\u0026#34;: \u0026#34;search-service\u0026#34;, \u0026#34;port\u0026#34;: 80, \u0026#34;useHttps\u0026#34;: false }, { \u0026#34;host\u0026#34;: \u0026#34;search-service1\u0026#34;, \u0026#34;port\u0026#34;: 81, \u0026#34;useHttps\u0026#34;: false } ], \u0026#34;requestPermutation\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;ALL\u0026#34;, \u0026#34;value\u0026#34;: { \u0026#34;params\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;FROM_FILES_LINES\u0026#34;, \u0026#34;values\u0026#34;: { \u0026#34;q\u0026#34;: \u0026#34;/app/test-files/test-paramfiles/test_queries.txt\u0026#34; } } } }, { \u0026#34;type\u0026#34;: \u0026#34;ALL\u0026#34;, \u0026#34;value\u0026#34;: { \u0026#34;params\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;GRID_FROM_VALUES_SEQ\u0026#34;, \u0026#34;values\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;a1\u0026#34;, \u0026#34;values\u0026#34;: [ 0.45, 0.32 ] }, { \u0026#34;name\u0026#34;: \u0026#34;o\u0026#34;, \u0026#34;start\u0026#34;: 0.0, \u0026#34;end\u0026#34;: 2000.0, \u0026#34;stepSize\u0026#34;: 1.0 } ] } } } ], \u0026#34;batchByIndex\u0026#34;: 0, \u0026#34;parsingConfig\u0026#34;: { \u0026#34;singleSelectors\u0026#34;: [], \u0026#34;seqSelectors\u0026#34;: [ { \u0026#34;name\u0026#34;: \u0026#34;productIds\u0026#34;, \u0026#34;castType\u0026#34;: \u0026#34;STRING\u0026#34;, \u0026#34;selector\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;PLAINREC\u0026#34;, \u0026#34;path\u0026#34;: \u0026#34;\\\\ response \\\\ docs \\\\\\\\ product_id\u0026#34; } } ] }, \u0026#34;excludeParamsFromMetricRow\u0026#34;: [ \u0026#34;q\u0026#34; ], \u0026#34;taggingConfiguration\u0026#34;: { \u0026#34;initTagger\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;REQUEST_PARAMETER\u0026#34;, \u0026#34;parameter\u0026#34;: \u0026#34;q\u0026#34;, \u0026#34;extend\u0026#34;: false }, \u0026#34;processedTagger\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;NOTHING\u0026#34; }, \u0026#34;resultTagger\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;NOTHING\u0026#34; } }, \u0026#34;requestTemplateStorageKey\u0026#34;: \u0026#34;requestTemplate\u0026#34;, \u0026#34;mapFutureMetricRowCalculation\u0026#34;: { \u0026#34;functionType\u0026#34;: \u0026#34;IR_METRICS\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;irMetrics\u0026#34;, \u0026#34;queryParamName\u0026#34;: \u0026#34;q\u0026#34;, \u0026#34;requestTemplateKey\u0026#34;: \u0026#34;requestTemplate\u0026#34;, \u0026#34;productIdsKey\u0026#34;: \u0026#34;productIds\u0026#34;, \u0026#34;judgementProvider\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;FILE_BASED\u0026#34;, \u0026#34;filename\u0026#34;: \u0026#34;/app/test-files/test-judgements/test_judgements.txt\u0026#34; }, \u0026#34;metricsCalculation\u0026#34;: { \u0026#34;metrics\u0026#34;: [ {\u0026#34;name\u0026#34;: \u0026#34;DCG_10\u0026#34;, \u0026#34;function\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;DCG\u0026#34;, \u0026#34;k\u0026#34;: 10}}, {\u0026#34;name\u0026#34;: \u0026#34;NDCG_10\u0026#34;, \u0026#34;function\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;NDCG\u0026#34;, \u0026#34;k\u0026#34;: 10}}, {\u0026#34;name\u0026#34;: \u0026#34;PRECISION_4\u0026#34;, \u0026#34;function\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;PRECISION\u0026#34;, \u0026#34;k\u0026#34;: 4, \u0026#34;threshold\u0026#34;: 0.1}}, {\u0026#34;name\u0026#34;: \u0026#34;ERR_10\u0026#34;, \u0026#34;function\u0026#34;: {\u0026#34;type\u0026#34;: \u0026#34;ERR\u0026#34;, \u0026#34;k\u0026#34;: 10}} ], \u0026#34;judgementHandling\u0026#34;: { \u0026#34;validations\u0026#34;: [ \u0026#34;EXIST_RESULTS\u0026#34;, \u0026#34;EXIST_JUDGEMENTS\u0026#34; ], \u0026#34;handling\u0026#34;: \u0026#34;AS_ZEROS\u0026#34; } }, \u0026#34;excludeParams\u0026#34;: [ \u0026#34;q\u0026#34; ] }, \u0026#34;singleMapCalculations\u0026#34;: [], \u0026#34;allowedTimePerElementInMillis\u0026#34;: 1000, \u0026#34;allowedTimePerBatchInSeconds\u0026#34;: 6000, \u0026#34;allowedTimeForJobInSeconds\u0026#34;: 720000, \u0026#34;expectResultsFromBatchCalculations\u0026#34;: false, \u0026#34;wrapUpFunction\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;AGGREGATE_FROM_DIR_BY_REGEX\u0026#34;, \u0026#34;weightProvider\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;CONSTANT\u0026#34;, \u0026#34;weight\u0026#34;: 1.0 }, \u0026#34;regex\u0026#34;: \u0026#34;[(]q=.+[)]\u0026#34;, \u0026#34;outputFilename\u0026#34;: \u0026#34;(ALL1)\u0026#34;, \u0026#34;readSubDir\u0026#34;: \u0026#34;testJob\u0026#34;, \u0026#34;writeSubDir\u0026#34;: \u0026#34;testJob\u0026#34; } } Example Job Definition Explained Parameter meaning The above on posting to the search_eval_no_ser endpoint is parsed into an JobMessages.SearchEvaluation instance. Within Kolibri, the parsing of sent data utilizes the spray lib, and all types except base types need a JsonFormat definition that specifies how a passed json is transformed into the specific object and how an object is transformed back to its string representation. Those definitions are always found within the \u0026lsquo;de.awagen.kolibri-[datatypes/base].io.json\u0026rsquo; package and carry the suffix JsonProtocol. More details on this in the follow up sections.\nThe actual SearchEvaluation message case class looks like this:\ncase class SearchEvaluation(jobName: String, fixedParams: Map[String, Seq[String]], contextPath: String, connections: Seq[Connection], requestPermutation: Seq[ModifierGeneratorProvider], batchByIndex: Int, parsingConfig: ParsingConfig, excludeParamsFromMetricRow: Seq[String], requestTemplateStorageKey: String, mapFutureMetricRowCalculation: FutureCalculation[WeaklyTypedMap[String], MetricRow], singleMapCalculations: Seq[Calculation[WeaklyTypedMap[String], CalculationResult[Double]]], taggingConfiguration: Option[BaseTaggingConfiguration[RequestTemplate, (Either[Throwable, WeaklyTypedMap[String]], RequestTemplate), MetricRow]], wrapUpFunction: Option[JobWrapUpFunction[Unit]], allowedTimePerElementInMillis: Int = 1000, allowedTimePerBatchInSeconds: Int = 600, allowedTimeForJobInSeconds: Int = 7200, expectResultsFromBatchCalculations: Boolean = true) extends JobMessage Lets summarize what the distinct attribute are used for:\nName:Type What for? jobName: String job name for the execution. If execution with same jobName is running, the request to start another one with the same name will be denied. fixedParams: Map[String, Seq[String]] Parameter name/values mapping for parameters that wont change between requests. contextPath: String Context path for the requests. The host settings where to send those requests to is defined within connections. connections: Seq[Connection] Single or multiple connections against which the requests shall be sent. A connection holds host, port, flag whether to use https or http and optional credentials. requestPermutation: : Seq[ModifierGeneratorProvider] Single or multiple ModifierGeneratorProvider. Each of those providers provides methods to retrieve the Seq of generators of modifiers of RequestTemplateBuilders or a partitioning, which is a generator of generators of mentioned modifiers. For more detail see later sections. batchByIndex: Int index (0-based) to define which generator of modifiers to batch by. E.g in the above example specification setting this value to 0 batches by the generator what generates the modifiers corresponding to the single query-parameter values, e.g its the first one in the definition, thus index 0. parsingConfig: ParsingConfig The parsing configuration defining which values to extract as what data type and under which key to place into the result map. The result map can then be utilized to derive metrics / tags or similar. excludeParamsFromMetricRow: Seq[String] Gives the parameter names that shall not be part of the parameter set in the aggregation result (MetricRow[Double]). For same given tags results would be aggregated per set of parameters, thus if this shall not happen on the per-query level, or if overall aggregation shall aggregate values over multiple queries, query parameter should be added here. If granularity on per-query level is needed, this should be reflected in the tag sticked to the result instead (for more details on tagging see later sections of the doc). requestTemplateStorageKey: String This simply defines an arbitrary storage key used to put the request template in the result map for further reference down the processing chain. mapFutureMetricRowCalculation: FutureCalculation[WeaklyTypedMap[String], MetricRow] Definition of the MetricRow calculation based on WeaklyTypedMap[String], yielding a Future result due to additional steps involved such as loading the judgements. singleMapCalculations: Seq[Calculation[WeaklyTypedMap[String], CalculationResult[Double]]] Additional calculations based on WeaklyTypedMap[String] parsed response, leading to CalculationResult[Double] taggingConfiguration: Option[BaseTaggingConfiguration[RequestTemplate, (Either[Throwable, WeaklyTypedMap[String]], RequestTemplate), MetricRow]] This specifies a tagging configuration, allowing tagging on the request level (using RequestTemplate), on the response level (using (Either[Throwable, WeaklyTypedMap[String]], RequestTemplate)) and on the final outcome level (using the result MetricRow[Double] object) wrapUpFunction: Option[JobWrapUpFunction[Unit]] Wrap-up function to execute after the execution has finished. This could be the aggregation of all single results to an overall result or similar. This is executed on the node of the Job Manager Actor. In case of many single results, its beneficial to write results directly from the nodes generating the results and aggregating all to an overall result later instead of sending all partial results as serialized messages across the cluster. allowedTimePerElementInMillis: Int Specifies the time in milliseconds a single processing element in batch can take till finishing. allowedTimePerBatchInSeconds: Int Specifies the time a single batch is allowed to take till finishing execution (in seconds). allowedTimeForJobInSeconds: Int Specifies the time a full job is allowed to execute. If exceeding the time, the job is aborted (time given in seconds) expectResultsFromBatchCalculations: Boolean Specifies whether the job manager actor expects results for single batches back from the single executing nodes. Example definition explained In the above example job definitions you can observe the following:\njobName is \u0026ldquo;testJob\u0026rdquo; fixed parameters are set and will be used for each request: k1=v1\u0026amp;k1=v2\u0026amp;k2=v3 contextPath is \u0026ldquo;search\u0026rdquo; connections specifies to distinct connections, one going to host \u0026ldquo;search-service:80\u0026rdquo;, the other to \u0026ldquo;search-service1:81\u0026rdquo;, both using normal http:// requests without credentials set. This makes the assumptions that the corresponding services are running and indeed exposing a \u0026ldquo;search\u0026rdquo;-endpoint. The execution flow utilizes a balanced execution across connections, thus about equal load can be expected on both if both have similar latencies. the request permutation permutates only different url parameters, but does not vary headers or bodies. The permutated parameter values include q1-q10 for parameter \u0026ldquo;q\u0026rdquo;, values 0.45 and 0.32 for parameter \u0026ldquo;a1\u0026rdquo; and 2001 parameter values in the range [0.0, 2000.0] in step sizes of 1 for parameter \u0026ldquo;o\u0026rdquo;. Note that the order of parameters plays a role here, since the parameter \u0026ldquo;batchByIndex\u0026rdquo; refers to exactly this ordering, e.g if set to 0 using parameter \u0026ldquo;q\u0026rdquo;. Index 1 would refer to parameter \u0026ldquo;a1\u0026rdquo;, index 2 to parameter \u0026ldquo;o\u0026rdquo;. batchByIndex = 0, thus each batch only handles a single setting for parameter \u0026ldquo;q\u0026rdquo;. We want to have per-query granularity here, thus each batch result corresponds to a valid aggregation by itself. These concerns about the smallest granularity needed should go into decision which modifiers to batch by. parsing config only specifies the parsing of values with key \u0026ldquo;productIds\u0026rdquo;, assuming each element to be of type String. That we are expecting a Seq and not a single value is given by the fact that the selector is defined within \u0026ldquo;seqSelectors\u0026rdquo; and within the \u0026ldquo;selector\u0026rdquo; the type is PLAINREC, meaning plain recursive. The path is set to \\\\ response \\\\ docs \\\\\\\\ product_id, which describes a json path like this: { \u0026#34;response\u0026#34;: { \u0026#34;docs\u0026#34;: [ {\u0026#34;product_id\u0026#34;: \u0026#34;value1\u0026#34;}, {\u0026#34;product_id\u0026#34;: \u0026#34;value2\u0026#34;} ] } } In the result product ids can be retrieved of the result map via key \u0026ldquo;productIds\u0026rdquo;. There are multiple variants to parse data out of a json, such as (see JsonSelectorJsonProtocol and TypedJsonSelectorJsonProtocol)\nSINGLEREC: single recursive selector, e.g recursively on json root without any selectors before PLAINREC: some plain path selectors followed by recursive selector at the end RECPLAIN: recursive selector (may contain plain path) then mapped to some plain selection (each element from recursive selection) RECREC: recursive selector (may contain plain path) then flatMapped to another recursive selector (each element from the first recursive selection, // e.g mapping the Seq[JsValue] elements) parameter \u0026ldquo;q\u0026rdquo; is the only parameter given in \u0026ldquo;excludeParamsFromMetricRow\u0026rdquo; setting, since we dont want the parameter \u0026ldquo;q\u0026rdquo; to occur as parameter in our aggregated MetricRow[Double] result, since aggregations are done per parameter setting lateron, and we couldnt do this before we removed the parameter \u0026ldquo;q\u0026rdquo; from the single results. As youll see below, we rather include parameter q in the tagging of the single results. \u0026ldquo;taggingConfiguration\u0026rdquo; specifies that we won\u0026rsquo;t add a tag for the parsed result or the final MetricRow[Double] result, but we will tag based on the RequestTemplate\u0026rsquo;s request parameter \u0026ldquo;q\u0026rdquo; for each request to the external endpoint. This will effectively result in one partial result per query, with filename given by the Tag toString method, in this case resulting in names (q=[paramValue]), e.g (q=q1), (q=q2), and so on. \u0026ldquo;requestTemplateStorageKey\u0026rdquo;: simply defines under which value the RequestTemplate value is stored in the result map. As you saw above, we already stored the parsed product IDs with key \u0026ldquo;productIds\u0026rdquo;, and RequestTemplate used for the request will be stored under \u0026ldquo;requestTemplate\u0026rdquo;. \u0026ldquo;mapFutureMetricRowCalculation\u0026rdquo; specifies which metrics are calculated based on which judgement file, which metrics are calculated (in the example DCG@10, NDCG@10, PRECISION@4, ERR). It also specifies validations on judgements and handling of missing judgements. In the given example validations include validation that there are productId results, that some judgements exist for the results, and if those validations are passed missing judgements are handled as by replacement with value 0.0. \u0026ldquo;singleMapCalculations\u0026rdquo; is empty, thus no additional calculations are executed apart from above defined metrics. \u0026ldquo;allowedTimePerElementInMillis\u0026rdquo; is set to 1000, thus we allow up to 1s for each element to finish processing \u0026ldquo;allowedTimePerBatchInSeconds\u0026rdquo;: here we allow 6000 seconds, thus 100 minutes for a single batch to execute \u0026ldquo;allowedTimeForJobInSeconds\u0026rdquo;: is set to 720000 seconds, meaning 120 * 100 minutes \u0026ldquo;expectResultsFromBatchCalculations\u0026rdquo; is set to \u0026ldquo;false\u0026rdquo;, thus no results are serialized and sent across the cluster back to the Job Manager. \u0026ldquo;wrapUpFunction\u0026rdquo; is defined such that all results matching the given regex are aggregated to an overall result and written to file with name \u0026ldquo;(ALL1)\u0026rdquo; into the same folder the single results were picked from (e.g for this subDir must be same as jobName). Also, a weightProvider can be defined to provide distinct weights per query, or just one of type \u0026ldquo;CONSTANT\u0026rdquo; with the weight to apply for all samples: { \u0026#34;type\u0026#34;: \u0026#34;AGGREGATE_FROM_DIR_BY_REGEX\u0026#34;, \u0026#34;weightProvider\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;CONSTANT\u0026#34;, \u0026#34;weight\u0026#34;: 1.0 }, \u0026#34;regex\u0026#34;: \u0026#34;[(]q=.+[)]\u0026#34;, \u0026#34;outputFilename\u0026#34;: \u0026#34;(ALL1)\u0026#34;, \u0026#34;readSubDir\u0026#34;: \u0026#34;testJob\u0026#34;, \u0026#34;writeSubDir\u0026#34;: \u0026#34;testJob\u0026#34; } Example Aggregation / Analyze Executions Kolibri provides an execution-endpoint, for which examples can be found in the \u0026rsquo;testAggregation.json\u0026rsquo; (aggregation example, same as used for the wrapup function above) and \u0026rsquo;testAnalyze.json\u0026rsquo;\n{ \u0026#34;type\u0026#34;: \u0026#34;ANALYZE_BEST_WORST_REGEX\u0026#34;, \u0026#34;directory\u0026#34;: \u0026#34;testJob\u0026#34;, \u0026#34;regex\u0026#34;: \u0026#34;[(]q=.+[)]\u0026#34;, \u0026#34;currentParams\u0026#34;: { \u0026#34;a1\u0026#34;: [\u0026#34;0.45\u0026#34;], \u0026#34;k1\u0026#34;: [\u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34;], \u0026#34;k2\u0026#34;: [\u0026#34;v3\u0026#34;], \u0026#34;o\u0026#34;: [\u0026#34;479.0\u0026#34;] }, \u0026#34;compareParams\u0026#34;: [ { \u0026#34;a1\u0026#34;: [\u0026#34;0.32\u0026#34;], \u0026#34;k1\u0026#34;: [\u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34;], \u0026#34;k2\u0026#34;: [\u0026#34;v3\u0026#34;], \u0026#34;o\u0026#34;: [\u0026#34;1760.0\u0026#34;] }, { \u0026#34;a1\u0026#34;: [\u0026#34;0.45\u0026#34;], \u0026#34;k1\u0026#34;: [\u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34;], \u0026#34;k2\u0026#34;: [\u0026#34;v3\u0026#34;], \u0026#34;o\u0026#34;: [\u0026#34;384.0\u0026#34;] }, { \u0026#34;a1\u0026#34;: [\u0026#34;0.45\u0026#34;], \u0026#34;k1\u0026#34;: [\u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34;], \u0026#34;k2\u0026#34;: [\u0026#34;v3\u0026#34;], \u0026#34;o\u0026#34;: [\u0026#34;1325.0\u0026#34;] } ], \u0026#34;metricName\u0026#34;: \u0026#34;NDCG_10\u0026#34;, \u0026#34;queryParamName\u0026#34;: \u0026#34;q\u0026#34;, \u0026#34;n_best\u0026#34;: 5, \u0026#34;n_worst\u0026#34;: 4 } The latter picks the single result files according to the provided regex, defines the parameters to compare against (\u0026lsquo;currentParams\u0026rsquo;) and the variants to compare against (\u0026lsquo;compareParams\u0026rsquo;). Futher, \u0026lsquo;metricName\u0026rsquo; defines the name of the metric to use for comparison, and \u0026rsquo;n_best\u0026rsquo; and \u0026rsquo;n_worst\u0026rsquo; defines the n most increasing values / most decreasing values to be kept. \u0026lsquo;queryParamName\u0026rsquo; specifies the parameter name that in this example is extracted by regex from the file name of the partial result.\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/4-calculations/","title":"Calculations","tags":[],"description":"","content":"Lets look at available calculation types:\nCalculation Types IR_METRICS IDENTITY FIRST_TRUE FIRST_FALSE TRUE_COUNT FALSE_COUNT BINARY_PRECISION_TRUE_AS_YES BINARY_PRECISION_FALSE_AS_YES STRING_SEQUENCE_VALUE_OCCURRENCE_HISTOGRAM COMPLETION COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/4-file-formats/","title":"File Formats","tags":[],"description":"","content":"This section describes the used file formats to provide data such as standalone mappings, mapped parameters, judgements and the like.\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/4-file-formats/4-parameters/","title":"Parameters","tags":[],"description":"","content":"COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/1-first-steps/4-useful-examples-pt2/","title":"Actually useful examples Pt2 - Comparing search systems","tags":[],"description":"","content":"Configuring the Job We have seen a general job configuration in the previous section. Now we want to focus on the configuration for the task of comparing two or more distinct search systems. In this example we will use jaccard metrics for this purpose.\nWhat we will need to configure here is:\nwhich parameter settings do we need to form the parameter-permutations that define the http requests a REQUEST_PARSE task that specifies the search systems of interest within the connections settings and specifies as REQUEST_MODE the value REQUEST_ALL_CONNECTIONS. This causes requests be sent to all search systems (as opposed to DISTRIBUTE_LOAD, which balances the load among all defined connections), and stores the results for each system under [successKeyName]-[connectionIndex], where [successKeyName] refers to the value specified in below task definition and [connectionIndex] is the (1-based!) index of the respective connection the request was sent to a MAP_COMPARISON task that specifies two input keys referring to the results corresponding to the distinct search system (remember the index-suffix in the previous task definition, thus suffix -1 means the first connection specified in the connection-list, -2 means the second and so on) (Optional): we can define multiple MAP_COMPARISON tasks and compare multiple distinct systems. If we do so, we do not have a single result, but multiple corresponding to the distinct comparisons. In this case we need to use a MERGE_METRIC_ROWS task to fill those values into a single result, whose key can then be referenced in the metricRowResultKey setting of the job definition. The value for this key is actually what will be written as (partial) result for the defined tag (in the below example we tag by dummy variable since we do only want to have a single file to avoid having one file per query where the content is simply a one-liner, but this is an optional consideration). Specifying general job settings Specifying the REQUEST_PARSE task Specifying the MAP_COMPARISON task After submitting the job and starting it on the STATUS page, shortly after you should see results written in the current date folder within the configured results subfolder. Result files come in two flavors, CSV and json. Below two examples of the resulting format for the job definition above.\nCSV\nNote that the format contains a few comment lines (starting with #). These contain important information for aggregation of data. In this case we have a DOUBLE as type of the metric jaccard as given in the column value-jaccard. Thus the header information starting with # K_METRIC_AGGREGATOR_MAPPING just specifies that the metric jaccard is to be aggregated as average over values of type double (DOUBLE_AVG)\n# K_METRIC_AGGREGATOR_MAPPING jaccard DOUBLE_AVG k1\tq\tfail-count-jaccard\tweighted-fail-count-jaccard\tfailReasons-jaccard\tsuccess-count-jaccard\tweighted-success-count-jaccard\tvalue-jaccard v1\u0026amp;v2\tq1\t0\t0.0000\t1\t1.0000\t0.2667 v1\u0026amp;v2\tq5\t0\t0.0000\t1\t1.0000\t0.4118 v1\u0026amp;v2\tq3\t0\t0.0000\t1\t1.0000\t0.1333 v1\u0026amp;v2\tq4\t0\t0.0000\t1\t1.0000\t0.4286 v1\u0026amp;v2\tq2\t0\t0.0000\t1\t1.0000\t0.3333 JSON\n{ \u0026#34;data\u0026#34;: [ { \u0026#34;datasets\u0026#34;: [ { \u0026#34;data\u0026#34;: [ 0.3333333333333333, 0.42857142857142855, 0.4117647058823529, 0.13333333333333333, 0.26666666666666666 ], \u0026#34;failReasons\u0026#34;: [ {}, {}, {}, {}, {} ], \u0026#34;failSamples\u0026#34;: [ 0, 0, 0, 0, 0 ], \u0026#34;name\u0026#34;: \u0026#34;jaccard\u0026#34;, \u0026#34;successSamples\u0026#34;: [ 1, 1, 1, 1, 1 ], \u0026#34;weightedFailSamples\u0026#34;: [ 0.0, 0.0, 0.0, 0.0, 0.0 ], \u0026#34;weightedSuccessSamples\u0026#34;: [ 1.0, 1.0, 1.0, 1.0, 1.0 ] } ], \u0026#34;entryType\u0026#34;: \u0026#34;DOUBLE_AVG\u0026#34;, \u0026#34;failCount\u0026#34;: 0, \u0026#34;labels\u0026#34;: [ { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] } ], \u0026#34;successCount\u0026#34;: 1 } ], \u0026#34;name\u0026#34;: \u0026#34;(dummy=1)\u0026#34;, \u0026#34;timestamp\u0026#34;: \u0026#34;2023-08-09 09:32:55.034\u0026#34; } Troubleshoot In case you experience some issues (e.g storing job definition throws error or form content not displaying correctly after switching between job types), please refer to the troubleshoot section.\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/3-step-by-step/5-writers/","title":"Formats & Writers","tags":[],"description":"","content":"Formats \u0026amp; Writers Coming shortly :)\n"},{"uri":"http://awagen.github.io/kolibri_archive/kolibri-base/4-inputs/","title":"Inputs","tags":[],"description":"","content":"Chapter 4 Inputs Kolibri exposes an API for the batching / job execution mechanism, which heavily relies on the json format. The inputs here specify what should be calculated / executed, thus it\u0026rsquo;s important to have a convenient and consistent way of handling those inputs.\nThe following will provide an overview how those inputs are to be defined and handled internally, and how the existing mechanism can be extended.\nEnjoy :)\ncoming shortly :)\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/5-aggregationtypemappings/","title":"Aggregation Type Mappings","tags":[],"description":"","content":"Aggregation type mappings just specify for each defined metric name the appropriate type of aggregation. If no aggregation type is specified in the job definitions, the fallback will be DOUBLE_AVG (which will lead to problems if the calculated value is not a number and you did not specify the right aggregation type). The aggregation types are:\nAggregation Types DOUBLE_AVG SEQUENCE_KEEP_FIRST _MAP_UNWEIGHTED_SUM_VALUE _ _MAP_WEIGHTED_SUM_VALUE _NESTED_MAP_UNWEIGHTED_SUM_VALUE _NESTED_MAP_WEIGHTED_SUM_VALUE COMPLETION COMING SOON\n"},{"uri":"http://awagen.github.io/kolibri/1-first-steps/5-troubleshoot/","title":"Troubleshoot","tags":[],"description":"","content":"Troubleshoot Below gives a few issues that need resolving. They are mostly minor ones but can be annoying. Fixes to be expected shortly.\nUI\nIn case the App displays an error when you set up the examples as in the above, look out for fields for which not a single value is specified. This might still be ok, but those fields will not land in the generated json configuration (see screen to the right in the CREATE screen), in which case the validation detects this and does not persist the definition. Thus just click on the + sign for the specific field. This might be enough or you might need to add a value (which you can then delete again), just ensure that the key occurs in the resulting json (e.g even if it is an empty list or similar). Sometimes when selecting different job types / templates, the form might not reload properly. This should is alleviated by also selecting a template (\u0026lsquo;None\u0026rsquo; in case none exists or you need an empty form). In doubt you can still reload the page (should not be necessary) and just select the job type / template combination you need. Some fields in the job / task definitions might loose focus after typing. This should only be subset of fields, but then pasting into the field helps or you gotta click again to gain focus to continue typing. (fixed in main branch, will be fixed from version v0.2.3) BACKEND\nIn case of jobs that are currently not marked as processed but were in progress and thus have single batches in the in-progress state (in the storage), this can prohibit actually open ones to be picked. Likely due to number of in-progress states taken as decision criterion whether to pull new tasks in, in this case without checking that those actually do not run at this point but are state from where the job was stopped. Clearing this in-progress files (that would be cleared next time the job is started anyways) helps in those cases. Fix coming. (fixed in from version v0.2.3 ) If the number of tasks allowed to be in progress at any given time is lower than the number of tasks that can be claimed, the task status of those claimed but not yet in progress will be reset, since the other nodes do not see any status updated. Needs adding of status updates also for the tasks that are not yet processed but in status QUEUED. (fixed in main branch, will be fixed from version v0.2.4) "},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/6-taggingconfiguration/","title":"Tagging Configuration","tags":[],"description":"","content":"In Kolibri, tagging is used on results to establish groupings in the result sets. This allows mechanisms such as aggregation of different results based on equal tags. Thus tags effectively define the granularity of your results. Let\u0026rsquo;s say you tag by the query-parameter and you run the evaluation on a range of parameters, all over a set of 1000 queries. Now you will have 1000 single results. To the contrary, in case you have a parameter that can only assume two values, and you tag based on this parameter, you will only have two results. Yet tags can also be combined to refine the tagging further. This is what the extend flag is for. If this attribute is set to true, the tag will extend existing tags. If set to false, an additional tag will be added, which defines a grouping separate from already existing tags.\nThe currently available taggers are:\nrequest tagger by request parameter (every distinct value of the given parameter will have a distinct tag, leading to as many partial results as there are queries) parsing result tagger by length (meaning: number of results) none "},{"uri":"http://awagen.github.io/kolibri/1-first-steps/6-api-endpoints/","title":"API Endpoints","tags":[],"description":"","content":"Endpoints and example responses /health: status endpoint. Returns some ignorable text. Important part here is the status code. /resources/global: returns a list of resources currently loaded (such as judgement lists and the like) on a ( limited to single nodes) global level, e.g without taking into account further assignments such as the job they are used for. Example response: { \u0026#34;data\u0026#34;: [ { \u0026#34;resourceType\u0026#34;: \u0026#34;JUDGEMENT_PROVIDER\u0026#34;, \u0026#34;identifier\u0026#34;: \u0026#34;ident1\u0026#34; } ], \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } /jobs/open: Returns list of currently non-completed jobs. Example response: { \u0026#34;data\u0026#34;: [ { \u0026#34;batchCountPerState\u0026#34;: { \u0026#34;INPROGRESS_abc1\u0026#34;: 5 }, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688830485073\u0026#34;, \u0026#34;jobLevelDirectives\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;PROCESS\u0026#34; } ], \u0026#34;timePlacedInMillis\u0026#34;: 1688830485073 } ], \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } /jobs/batches: Returns list of batches currently in process. { \u0026#34;data\u0026#34;: [ { \u0026#34;processingInfo\u0026#34;: { \u0026#34;lastUpdate\u0026#34;: \u0026#34;2023-07-09 00:56:22\u0026#34;, \u0026#34;numItemsProcessed\u0026#34;: 129, \u0026#34;numItemsTotal\u0026#34;: 1000, \u0026#34;processingNode\u0026#34;: \u0026#34;abc1\u0026#34;, \u0026#34;processingStatus\u0026#34;: \u0026#34;IN_PROGRESS\u0026#34; }, \u0026#34;stateId\u0026#34;: { \u0026#34;batchNr\u0026#34;: 0, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688864117702\u0026#34; } }, { \u0026#34;processingInfo\u0026#34;: { \u0026#34;lastUpdate\u0026#34;: \u0026#34;2023-07-09 00:56:22\u0026#34;, \u0026#34;numItemsProcessed\u0026#34;: 129, \u0026#34;numItemsTotal\u0026#34;: 1000, \u0026#34;processingNode\u0026#34;: \u0026#34;abc1\u0026#34;, \u0026#34;processingStatus\u0026#34;: \u0026#34;IN_PROGRESS\u0026#34; }, \u0026#34;stateId\u0026#34;: { \u0026#34;batchNr\u0026#34;: 4, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688864117702\u0026#34; } }, { \u0026#34;processingInfo\u0026#34;: { \u0026#34;lastUpdate\u0026#34;: \u0026#34;2023-07-09 00:56:22\u0026#34;, \u0026#34;numItemsProcessed\u0026#34;: 129, \u0026#34;numItemsTotal\u0026#34;: 1000, \u0026#34;processingNode\u0026#34;: \u0026#34;abc1\u0026#34;, \u0026#34;processingStatus\u0026#34;: \u0026#34;IN_PROGRESS\u0026#34; }, \u0026#34;stateId\u0026#34;: { \u0026#34;batchNr\u0026#34;: 3, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688864117702\u0026#34; } }, { \u0026#34;processingInfo\u0026#34;: { \u0026#34;lastUpdate\u0026#34;: \u0026#34;2023-07-09 00:56:22\u0026#34;, \u0026#34;numItemsProcessed\u0026#34;: 131, \u0026#34;numItemsTotal\u0026#34;: 1000, \u0026#34;processingNode\u0026#34;: \u0026#34;abc1\u0026#34;, \u0026#34;processingStatus\u0026#34;: \u0026#34;IN_PROGRESS\u0026#34; }, \u0026#34;stateId\u0026#34;: { \u0026#34;batchNr\u0026#34;: 1, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688864117702\u0026#34; } }, { \u0026#34;processingInfo\u0026#34;: { \u0026#34;lastUpdate\u0026#34;: \u0026#34;2023-07-09 00:56:22\u0026#34;, \u0026#34;numItemsProcessed\u0026#34;: 128, \u0026#34;numItemsTotal\u0026#34;: 1000, \u0026#34;processingNode\u0026#34;: \u0026#34;abc1\u0026#34;, \u0026#34;processingStatus\u0026#34;: \u0026#34;IN_PROGRESS\u0026#34; }, \u0026#34;stateId\u0026#34;: { \u0026#34;batchNr\u0026#34;: 2, \u0026#34;jobId\u0026#34;: \u0026#34;taskSequenceTestJob2_1688864117702\u0026#34; } } ], \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } Deleting all job level directives for a given job (here: job_1688902767685): curl -XDELETE localhost:8001/jobs/job_1688902767685/directives/all Deleting a set of job level directives for a given job: curl -XDELETE --header \u0026#34;Content-Type: application/json\u0026#34; --data \u0026#39;[{\u0026#34;type\u0026#34;: \u0026#34;PROCESS\u0026#34; }]\u0026#39; localhost:8001/jobs/job_1688902767685/directives Adding list of job level directives for a given job to the persisted state: curl -XPOST --header \u0026#34;Content-Type: application/json\u0026#34; --data \u0026#39;[{\u0026#34;type\u0026#34;: \u0026#34;PROCESS\u0026#34; }]\u0026#39; localhost:8001/jobs/job_1688902767685/directives In all the job level directive endpoints the response will be a 200 response code with simple true boolean value as data or an error code with respective error message. Example for successful call:\n{\u0026#34;data\u0026#34;:true,\u0026#34;errorMessage\u0026#34;:\u0026#34;\u0026#34;} Job Result Retrieval Endpoints results/folders: returns a mapping of date to job identifiers for which results are available { \u0026#34;data\u0026#34;: { \u0026#34;2023-08-07\u0026#34;: [ \u0026#34;taskSequenceTestJob2\u0026#34; ], \u0026#34;2023-08-09\u0026#34;: [ \u0026#34;testJaccard\u0026#34;, \u0026#34;testJob1\u0026#34; ], \u0026#34;2023-07-14\u0026#34;: [ \u0026#34;test1\u0026#34;, \u0026#34;taskSequenceTestJob2\u0026#34;, \u0026#34;taskSequenceTestJob\u0026#34; ], \u0026#34;2023-07-13\u0026#34;: [ \u0026#34;taskSequenceTestJob2\u0026#34; ], \u0026#34;2023-08-10\u0026#34;: [ \u0026#34;testJob2\u0026#34;, \u0026#34;test1\u0026#34;, \u0026#34;taskSequenceTestJob2\u0026#34; ], \u0026#34;2023-08-08\u0026#34;: [ \u0026#34;testJob1\u0026#34; ], \u0026#34;2023-07-16\u0026#34;: [ \u0026#34;test1\u0026#34; ], \u0026#34;2023-07-27\u0026#34;: [ \u0026#34;test1\u0026#34;, \u0026#34;taskSequenceTestJob2\u0026#34; ] }, \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } results/[date-in-yyyy-mm-dd format]/[jobId]: returns a list of result files available for the specific date and jobId { \u0026#34;data\u0026#34;: [ \u0026#34;(dummy=1)-abc1-peufg.json\u0026#34; ], \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } results/[date-in-yyyy-mm-dd format]/[jobId]/content?file=[result-file-name]: returns the content of the result file { \u0026#34;data\u0026#34;: { \u0026#34;data\u0026#34;: [ { \u0026#34;datasets\u0026#34;: [ { \u0026#34;data\u0026#34;: [ 0.26666666666666666, 0.2222222222222222, 0.1875, 0.2727272727272727, 0.1875 ], \u0026#34;failReasons\u0026#34;: [ {}, {}, {}, {}, {} ], \u0026#34;failSamples\u0026#34;: [ 0, 0, 0, 0, 0 ], \u0026#34;name\u0026#34;: \u0026#34;jaccard\u0026#34;, \u0026#34;successSamples\u0026#34;: [ 1, 1, 1, 1, 1 ], \u0026#34;weightedFailSamples\u0026#34;: [ 0.0, 0.0, 0.0, 0.0, 0.0 ], \u0026#34;weightedSuccessSamples\u0026#34;: [ 1.0, 1.0, 1.0, 1.0, 1.0 ] } ], \u0026#34;entryType\u0026#34;: \u0026#34;DOUBLE_AVG\u0026#34;, \u0026#34;failCount\u0026#34;: 0, \u0026#34;labels\u0026#34;: [ { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q2\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q4\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q5\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q3\u0026#34; ] }, { \u0026#34;k1\u0026#34;: [ \u0026#34;v1\u0026#34;, \u0026#34;v2\u0026#34; ], \u0026#34;q\u0026#34;: [ \u0026#34;q1\u0026#34; ] } ], \u0026#34;successCount\u0026#34;: 1 } ], \u0026#34;name\u0026#34;: \u0026#34;(dummy=1)\u0026#34;, \u0026#34;timestamp\u0026#34;: \u0026#34;2023-08-10 08:02:30.506\u0026#34; }, \u0026#34;errorMessage\u0026#34;: \u0026#34;\u0026#34; } Above is a subset of the evailable endpoints. Complete list on the way!\n"},{"uri":"http://awagen.github.io/kolibri/2-config-details/3-config-options/7-tasks/","title":"Tasks","tags":[],"description":"","content":"Tasks are modular descriptions of computations. A job consists of batches of elements that undergo a sequence of tasks. These are the currently available task types:\nTask Type REQUEST_PARSE Define target systems along with http method, fixed parameters, contextPath, which fields to parse, on which criteria to tag and which keys are used for storage of successful results and failures. _METRIC_CALCULATION _ Based on parsed results (such as provided by the REQUEST_PARSE task), define which metrics to calculate. MAP_COMPARISON Allows comparison of two distinct results. At the moment only option: JACCARD_SIMILARITY. MERGE_METRIC_ROWS In case more than one result was generated in the task sequence, we can here merge distinct results. COMPLETION COMING SOON\n"},{"uri":"http://awagen.github.io/","title":"Home","tags":[],"description":"","content":"About Hi there,\nwelcome to the documentation page covering the projects on my github account (awagen on Github) and home of Kolibri.\n"},{"uri":"http://awagen.github.io/categories/","title":"Categories","tags":[],"description":"","content":""},{"uri":"http://awagen.github.io/tags/","title":"Tags","tags":[],"description":"","content":""}]