Create a file containing an ASCII ModelServerConfig protocol buffer, and pass its path to the server using the --model_config_file flag. (Some useful references: what an ASCII protocol buffer looks like; how to pass flags in Docker.)
You can also reload the model config on the fly, after the server is running, via the HandleReloadConfigRequest RPC endpoint. This will cause models in the new config that are not in the old config to be loaded, and models in the old config that are not in the new config to be unloaded; (models in both configs will remain in place, and will not be transiently unloaded).
For all but the most advanced use-cases, you'll want to use the ModelConfigList option, which is a list of ModelConfig protocol buffers. Here's a basic example, before we dive into advanced options below.
model_config_list {
config {
name: 'my_first_model'
base_path: '/tmp/my_first_model/'
}
config {
name: 'my_second_model'
base_path: '/tmp/my_second_model/'
}
}
Each ModelConfig specifies one model to be served, including its name and the path where the Model Server should look for versions of the model to serve, as seen in the above example. By default the server will serve the version with the largest version number. This default can be overridden by changing the model_version_policy field.
To serve a specific version of the model, rather than always transitioning to the one with the largest version number, set model_version_policy to "specific" and provide the version number you would like to serve. For example, to pin version 42 as the one to serve:
model_version_policy {
specific {
versions: 42
}
}
This option is useful for rolling back to a know good version, in the event a problem is discovered with the latest version(s).
To serve multiple versions of the model simultaneously, e.g. to enable canarying a tentative new version with a slice of traffic, set model_version_policy to "specific" and provide multiple version numbers. For example, to serve versions 42 and 43:
model_version_policy {
specific {
versions: 42
versions: 43
}
}
Sometimes it's helpful to add a level of indirection to model versions. Instead of letting all of your clients know that they should be querying version 42, you can assign an alias such as "stable" to whichever version is currently the one clients should query. If you want to redirect a slice of traffic to a tentative canary model version, you can use a second alias "canary".
You can configure these model version aliases, or labels, like so:
model_version_policy {
specific {
versions: 42
versions: 43
}
}
version_labels {
key: 'stable'
value: 42
}
version_labels {
key: 'canary'
value: 43
}
In the above example, you are serving versions 42 and 43, and associating the label "stable" with version 42 and the label "canary" with version 43. You can have your clients direct queries to one of "stable" or "canary" (perhaps based on hashing the user id) using the version_label field of the ModelSpec protocol buffer, and move forward the label on the server without notifying the clients. Once you are done canarying version 43 and are ready to promote it to stable, you can update the config to:
model_version_policy {
specific {
versions: 42
versions: 43
}
}
version_labels {
key: 'stable'
value: 43
}
version_labels {
key: 'canary'
value: 43
}
If you subsequently need to perform a rollback, you can revert to the old config that has version 42 as "stable". Otherwise, you can march forward by unloading version 42 and loading the new version 44 when it is ready, and then advancing the canary label to 44, and so on.
Please note that labels can only be assigned to model versions that are loaded and available for serving. Once a model version is available, one may reload the model config on the fly, to assign a label to it (can be achieved using HandleReloadConfigRequest RPC endpoint).