diff --git a/CHANGELOG.md b/CHANGELOG.md index f1637e25b3..95e2cd6cc2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,24 +1,274 @@ # Changelog + +## [v2.10.2](https://github.com/SeldonIO/seldon-core/releases/tag/v2.10.2) - 2025-12-19 + +Overview + +Core `2.10.2` is a patch release, fixing several long-standing issues which became more visible in the `2.10.x` releases. + +- Pipelines failed to create or delete, they would remain in that state, even if the cluster then became healthy +- Potential gRPC stream blocking between agent/scheduler causing models to not load +- The operator blocked from reconciling custom resources, thus preventing administrators from making changes +- MLServer parallel workers not being set +- MLServer access log flooding + +Bugfix details: + +If a Kafka cluster was unhealthy and the components such as `model-gateway`, were unable to connect to a broker, a pipeline would be in a failed state. The `scheduler` would be notified of this but then once the Kafka cluster was healthy, would not retry to create the pipeline on the necessary services. This fix will now attempt to retry creating/deleting pipelines on a configured periodic basis. It is controlled by 2 environment variables on the `scheduler`: + +- `RETRY_CREATING_FAILED_PIPELINES_TICK` default to `60s` is how often the `scheduler` will attempt to create pipelines which failed to create +- `RETRY_DELETING_FAILED_PIPELINES_TICK` default to `60s` is how often the `scheduler` will attempt to delete pipelines which failed to delete +- `MAX_RETRY_FAILED_PIPELINES` default to `10` is max retries the scheduler will attempt + +gRPC streams were not being properly handled when attempting to send data. The Go context was not being checked to see if the receiver had closed the stream. This led to blocking issues where the `scheduler` would attempt to load a model on an `agent`, but then the `agent` had closed the stream. This prevented the `scheduler` from taking further model loading actions. We also noticed this same pattern in several other places. We now verify the stream is still active before attempting to send. + +The `operator` sometimes has to re-attempt sending cluster state to the `scheduler` due to any number of reasons (bad connectivity, scheduler restarting due to failed liveness check etc). When this happens, if it happens repeatedly, the operator ends up in a state where it may retry sending for many hours. This means the `operator` is blocked from reconciling any resource. We've now addressed this by setting a max retry limit on the exponential backoff retry settings, while also enforcing timeouts on any network call from the `operator`. + +The helm charts were setting the wrong environment variable when configuring the number of parallel workers on `MLServer`. This would have caused latency and reduced throughput for any customers who had set this number > 1 (default is 1). Helm charts now use the correct variable `MLSERVER_PARALLEL_WORKERS` and defaults to 1. Note this is the correct variable to use for MLServer versions >= `1.1.0`. If you are using a version less than this, then you should manually set the variable `MLSERVER_MODEL_PARALLEL_WORKERS` as helm charts no longer support this. + +During inference `MLServer` would log every inference request. Under high load, this could cause latency and reduced throughput, as some cloud providers throttle disk IO operations. This is now turned off by default and can be configured via `MLSERVER_DEBUG` on the `ServerConfig` custom resource under `MLServer` or by using helm value `serverConfig.mlserver.debug`. + +To aid with debugging issues within the scheduler, we added `pprof`. This is turned off by default, but can be configured via environment variables: + +- `ENABLE_PPROF` default `false` +- `PPROF_PORT` default `6060` is the HTTP port to access the performance dumps. Note it listens on `localhost` so can only be accessed via port-forwarding. +- `PPROF_BLOCK_RATE` default `0` controls how frequently blocking events (mutex contention, channel operations) are sampled. 1 captures every blocking event +- `PPROF_MUTEX_RATE` default `0` controls how frequently mutex contentions events are sampled. 1 captures every mutex contention. + +Upgrading from previous Core 2 versions +No CRD changes are introduced in this patch release, but if upgrading from a version previous to 2.10.0, you should first read the [2.10.0 release notes](https://github.com/SeldonIO/seldon-core/releases/tag/v2.10.0). If you wish to set the number of parallel workers on MLServer > 1 you will need to set `serverConfig.mlserver.parallel_workers` in your helm values (only if you're running MLServer > `1.1.0`). + + +### Changelog + +All notable changes to this project will be documented in this file. Dates are displayed in UTC. + +Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog). + +#### [v2.10.2](https://github.com/SeldonIO/seldon-core/compare/v2.10.1...v2.10.2) + +> 18 December 2025 + +- fix(mlserver): turn off logging request logs by default [`#7042`](https://github.com/SeldonIO/seldon-core/pull/7042) +- feat(e2e-tests): added inference of pipeline [`#7029`](https://github.com/SeldonIO/seldon-core/pull/7029) +- feat(e2e-tests): model experiment test [`#7023`](https://github.com/SeldonIO/seldon-core/pull/7023) +- test model over-commit [`#7021`](https://github.com/SeldonIO/seldon-core/pull/7021) +- fix model infer [`#7018`](https://github.com/SeldonIO/seldon-core/pull/7018) +- feat(e2e-tests): pipeline tests [`#7010`](https://github.com/SeldonIO/seldon-core/pull/7010) +- feat(e2e-tests): server setup [`#7012`](https://github.com/SeldonIO/seldon-core/pull/7012) +- feat(e2e-tests): model deployment and inference of test models from python tests to bdd [`#7007`](https://github.com/SeldonIO/seldon-core/pull/7007) +- feat(e2e-test): test for model deletion steps [`#7004`](https://github.com/SeldonIO/seldon-core/pull/7004) +- refactor(e2e-tests): names and logger [`#6999`](https://github.com/SeldonIO/seldon-core/pull/6999) +- config for tests [`#6994`](https://github.com/SeldonIO/seldon-core/pull/6994) +- feat(e2e-test): gen client and deletion of resources [`#6995`](https://github.com/SeldonIO/seldon-core/pull/6995) +- feat(all): Add release version to binaries [`#6912`](https://github.com/SeldonIO/seldon-core/pull/6912) +- fix(godog): go mod module name [`#6991`](https://github.com/SeldonIO/seldon-core/pull/6991) +- feat(e2e-test): test for custom model spec & inference via HTTP/gRPC [`#6979`](https://github.com/SeldonIO/seldon-core/pull/6979) +- feat(operator): auto generated custom k8s client [`#6984`](https://github.com/SeldonIO/seldon-core/pull/6984) +- fix(helm): mlserver env var parallel workers [`#6974`](https://github.com/SeldonIO/seldon-core/pull/6974) +- Exp bdd tests [`#6965`](https://github.com/SeldonIO/seldon-core/pull/6965) +- fix(scheduler/model-gw): failed pipelines never retried [`#6917`](https://github.com/SeldonIO/seldon-core/pull/6917) +- docs(tracing.md): Tracing Page Update [`#6956`](https://github.com/SeldonIO/seldon-core/pull/6956) +- Update observability.md [`#6958`](https://github.com/SeldonIO/seldon-core/pull/6958) +- Update README.md [`#6948`](https://github.com/SeldonIO/seldon-core/pull/6948) +- docs(multiplepages): fixed broken links [`#6914`](https://github.com/SeldonIO/seldon-core/pull/6914) +- docs (Update pandasquery): fixed the broken link [`#6945`](https://github.com/SeldonIO/seldon-core/pull/6945) +- fix prometheus installation [`#6931`](https://github.com/SeldonIO/seldon-core/pull/6931) +- Add files via upload [`#6934`](https://github.com/SeldonIO/seldon-core/pull/6934) +- Update README.md [`#6932`](https://github.com/SeldonIO/seldon-core/pull/6932) +- Update open-inference-protocol-v2.openapi.yaml [`#6925`](https://github.com/SeldonIO/seldon-core/pull/6925) +- feat(agent): improve error logging [`#6918`](https://github.com/SeldonIO/seldon-core/pull/6918) +- Update open-inference-protocol-v2.openapi.yaml [`#6923`](https://github.com/SeldonIO/seldon-core/pull/6923) +- Add files via upload [`#6920`](https://github.com/SeldonIO/seldon-core/pull/6920) +- docs(kubernetes examples): updated the curl commands part2 [`#6913`](https://github.com/SeldonIO/seldon-core/pull/6913) +- docs(kubernetes examples): updated the curl commands [`#6905`](https://github.com/SeldonIO/seldon-core/pull/6905) +- fix(agent/scheduler/model-gw/pipeline-gw/operator): closing gRPC stream [`#6902`](https://github.com/SeldonIO/seldon-core/pull/6902) +- fix(operator): Blocking gRPC calls [`#6898`](https://github.com/SeldonIO/seldon-core/pull/6898) +- feat(scheduler): optionally enable pprof [`#6899`](https://github.com/SeldonIO/seldon-core/pull/6899) +- Generating changelog for v2.10.2 [`70dac2b`](https://github.com/SeldonIO/seldon-core/commit/70dac2baa5eef66d3cb0fee42c426fcb61c44353) +- GitBook: No commit message [`c9bba8a`](https://github.com/SeldonIO/seldon-core/commit/c9bba8a882ff848a6ef12712813465a35efc1190) +- Setting version for helm charts [`fd3f01d`](https://github.com/SeldonIO/seldon-core/commit/fd3f01df622d11007150fde21b191e8fbc98f4a0) +- Setting version for yaml manifests [`5d98bca`](https://github.com/SeldonIO/seldon-core/commit/5d98bca33aa232aa468df42ba20d4bb497d106bd) + + +[Changes][v2.10.2] + + + +## [v2.10.1](https://github.com/SeldonIO/seldon-core/releases/tag/v2.10.1) - 2025-10-20 + +### Overview + +Core 2.10.1 is a patch release fixing a significant partial scheduling bug that existed since 2.9.0 but started being visible to users from 2.10.0. + +We also eliminate a set of scenarios where the scheduler experienced a slow-start due to waiting for connections from server replicas that were never created, due to configuration errors (i.e updating a `ServerConfig` after `Servers` with `.spec.replicas > 0` referencing that config have been deployed as StatefulSets) + +### Bugfix details: + +Starting in 2.10.0, after pods of an inference server hosting a model were restarted (irrespective of the reason), the model ended up scheduled only on approximately `model.spec.minReplicas` server replicas rather than the requested (and expected) `model.spec.replicas`. The variation in the actual model replicas being scheduled was dependent on the timing/sequencing of server replica connection to the scheduler after restart. + +This regression appeared because an existing bug in the partial scheduling logic (there since 2.9.0) started manifesting itself consistently after fixing a data race bug (not directly related to partial scheduling) in 2.10. Before, the data race bug was difficult to trigger under most cluster operation scenarios, so was not experienced by users. + +In 2.10.1, we fix the underlying bug so that partial scheduling works as expected. + +### Upgrading from previous Core 2 versions +No CRD or configuration changes are introduced in this patch release, but if upgrading from a version previous to 2.10.0, you should first read the [2.10.0 release notes](https://github.com/SeldonIO/seldon-core/releases/tag/v2.10.0) + +### Changelog + +Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog). + +#### [v2.10.1](https://github.com/SeldonIO/seldon-core/compare/v2.10.0...v2.10.1) + +> 20 October 2025 + +- fix(operator): incorrect expected replicas notification [`#6890`](https://github.com/SeldonIO/seldon-core/pull/6890) +- fix(scheduler): not all models deployed to Servers when minReplicas on Model is set [`#6885`](https://github.com/SeldonIO/seldon-core/pull/6885) + + +[Changes][v2.10.1] + + + +## [v2.10.0](https://github.com/SeldonIO/seldon-core/releases/tag/v2.10.0) - 2025-10-08 + +### Overview + +Core 2.10.0 is a release with significant new features, focused on scalability, usability and bugfixes. + +### Upgrading from previous Core 2 versions +- All CRD changes maintain backward compatibility with existing CRs +- We introduce new Core 2 scaling configuration options in SeldonConfig (`config.ScalingConfig.*`), with a wider goal of centralising Core 2 configuration and allowing for configuration changes after the Core 2 cluster is deployed. To ensure a smooth transition, some of the configuration options will only take effect starting from the next releases, but end-users are encouraged to set them to the desired values before upgrading to the next release (2.11). + +Upgrading when using helm is seamless, with existing helm values being used to fill in new configuration options. If not using helm, previous SeldonConfig CRs remain valid, but restrictive defaults will be used for the scaling configuration. One parameter in particular, `maxShardCountMultiplier` [[docs](https://docs.seldon.ai/seldon-core-2/v2.10/user-guide/performance-tuning/pipelines/scalability-pipelines#id-1.-how-scaling-works-at-a-glance)] will need to be set in order to take advantage of the new pipeline scalability features. This parameter can be changed and the effects of its value will be propagated to all components that use the config. + +### New features +- Pipeline scalability features, with all pipeline components (`dataflow-engine`, `pipelinegateway`, `modelgateway`) now being horizontally scalable, and not limited in terms of replicas to the number of kafka partitions per topic. [[docs](https://docs.seldon.ai/seldon-core-2/v2.10/user-guide/performance-tuning/pipelines/scalability-pipelines)] +- Integration with Kafka Schema Registry, providing visibility into the schema contracts for Models and Pipelines input and output topics when deploying Pipelines. This connects Core 2 pipelines to the broader Kafka ecosystem. This feature facillitates the integration with products like Kafka Connect and ksqlDB to build custom solutions for data streaming, processing, and logging tailored to your machine learning workflows. [[docs](https://docs.seldon.ai/seldon-core-2/integrations/confluent/schema-registry)] +- New translation layer converting OpenAI API REST to and from OIP. This allows for the usage of standard OpenAI libraries & clients when communicating with LLM models deployed via the Seldon LLM module. + +#### Experimental features (early preview, not production-ready) +- Configuration of inference servers as k8s Deployments rather that StatefulSets + +### Usability improvements +- Pipeline control plane now more robust to disruptions, with fine-grained status updates propagated towards Pipeline CR statuses. +- Pipeline data plane faster recovery after component restarts +- Eliminated sources of downtime during inference server replicas restarts, together with more graceful shutdowns across all components +- All Core 2 components now have associated k8s lifecycle probes + +### Bugfixes +- Fix bug affecting availability on inference server start-up after a restart +- Fix model native autoscaling remaining active after being disabled via config (once it was once activated). Model native autoscaling (based on lag) is disabled as a whole in 2.10, until we implement wider fixes. Until then, we strongly recommend enabling server autoscaling and controlling model autoscaling via HPA or KEDA. +- Fix issues with the rclone container becoming unresponsive after long periods of uptime. + +### Kudos: +We would like to recognise the significant contributions made by [@RobertSamoilescu](https://github.com/RobertSamoilescu) to Core 2 + +With contributions from [@RobertSamoilescu](https://github.com/RobertSamoilescu), [@domsolutions](https://github.com/domsolutions), [@MiguelAAe](https://github.com/MiguelAAe) , [@lc525](https://github.com/lc525), [@cherrymu](https://github.com/cherrymu), [@paulb-seldon](https://github.com/paulb-seldon), [@Rajakavitha1](https://github.com/Rajakavitha1) , [@monica-seldon](https://github.com/monica-seldon), + +------ + +### Changelog + +Dates are displayed in UTC. Generated by [`auto-changelog`](https://github.com/CookPete/auto-changelog). + +#### [v2.10.0](https://github.com/SeldonIO/seldon-core/compare/v2.9.1...v2.10.0) + +> 8 October 2025 + +- fix(probes): improve timing of k8s lifecycle probes [`#6861`](https://github.com/SeldonIO/seldon-core/pull/6861) +- fix(docs): update pipeline scalability docs with maxShardCountMultiplier info [`#6859`](https://github.com/SeldonIO/seldon-core/pull/6859) +- fix(scheduler): Add scaling config and upgrade paths [`#6833`](https://github.com/SeldonIO/seldon-core/pull/6833) +- fix(scheduler): typo when setting pipeline-gw status for a pipeline [`#6856`](https://github.com/SeldonIO/seldon-core/pull/6856) +- fix(scheduler): Allow for pipelines with some of their statuses set to PipelineStatusUnknown [`#6853`](https://github.com/SeldonIO/seldon-core/pull/6853) +- Removed redunant sorting by trigger for topology [`#6852`](https://github.com/SeldonIO/seldon-core/pull/6852) +- fix(scheduler, dataflow): pipeline loading/unloading on pipeline-gw and dataflow engine topology [`#6849`](https://github.com/SeldonIO/seldon-core/pull/6849) +- fix(pipeline-gw): temp preStop hook [`#6841`](https://github.com/SeldonIO/seldon-core/pull/6841) +- docs(pipeline): pipeline scalability docs [`#6838`](https://github.com/SeldonIO/seldon-core/pull/6838) +- enable Lychee for docs on v2 [`#6786`](https://github.com/SeldonIO/seldon-core/pull/6786) +- fix(modelgateway): Number of partitions retrieval [`#6828`](https://github.com/SeldonIO/seldon-core/pull/6828) +- fix(agent/rclone): rclone OOM [`#6830`](https://github.com/SeldonIO/seldon-core/pull/6830) +- fix proto imports [`#6836`](https://github.com/SeldonIO/seldon-core/pull/6836) +- fix unable to set 0 replicas [`#6834`](https://github.com/SeldonIO/seldon-core/pull/6834) +- feat(dataflow): added fullJitterBackoff ack for pipeline status [`#6831`](https://github.com/SeldonIO/seldon-core/pull/6831) +- feat(modelgw): modelgw status update [`#6799`](https://github.com/SeldonIO/seldon-core/pull/6799) +- fix(agent): force disable auto-scaling of models on agent/scheduler [`#6814`](https://github.com/SeldonIO/seldon-core/pull/6814) +- fix(operator): SubscribeControlPlane failure blocking loading other CRs [`#6824`](https://github.com/SeldonIO/seldon-core/pull/6824) +- feat(pipelinegw): pipeline status in pipelinegw [`#6767`](https://github.com/SeldonIO/seldon-core/pull/6767) +- remove import of undefined func [`#6822`](https://github.com/SeldonIO/seldon-core/pull/6822) +- feat(tests): pipeline scalability tests [`#6813`](https://github.com/SeldonIO/seldon-core/pull/6813) +- fix(docs): spelling and missing namespace attribute [`#6815`](https://github.com/SeldonIO/seldon-core/pull/6815) +- chore(helm): 3GB default dataflow memory req/limit [`#6811`](https://github.com/SeldonIO/seldon-core/pull/6811) +- feat(dataflow): pipeline status update [`#6757`](https://github.com/SeldonIO/seldon-core/pull/6757) +- fix blocked draining agents when waiting for model to be loaded which can't be loaded as no replicas available [`#6794`](https://github.com/SeldonIO/seldon-core/pull/6794) +- fix: repeateded identical subscription reqs sent to kafka [`#6807`](https://github.com/SeldonIO/seldon-core/pull/6807) +- fix(model-gw): graceful shutdown of kafka consumers [`#6801`](https://github.com/SeldonIO/seldon-core/pull/6801) +- fix(docs): Schema Registry Environment Configuration [`#6804`](https://github.com/SeldonIO/seldon-core/pull/6804) +- docs(schema-registry): Installation guide [`#6785`](https://github.com/SeldonIO/seldon-core/pull/6785) +- feat(kafka): Schema registry [`#6689`](https://github.com/SeldonIO/seldon-core/pull/6689) +- feat: Schema Registry in Ansible configuration [`#6679`](https://github.com/SeldonIO/seldon-core/pull/6679) +- fix(dataflow): deprecated use of kafka streams Transformer classes [`#6795`](https://github.com/SeldonIO/seldon-core/pull/6795) +- fix(controller): Server scaling spec [`#6613`](https://github.com/SeldonIO/seldon-core/pull/6613) +- fix(operator): failed update status [`#6789`](https://github.com/SeldonIO/seldon-core/pull/6789) +- Added watches on models [`#6788`](https://github.com/SeldonIO/seldon-core/pull/6788) +- fix(helm): added changes required to configure annotations for controller deployment [`#6748`](https://github.com/SeldonIO/seldon-core/pull/6748) +- feat(dataflow-engine): health probes [`#6766`](https://github.com/SeldonIO/seldon-core/pull/6766) +- feat(translator): OpenAI API REST translation to OIP [`#6619`](https://github.com/SeldonIO/seldon-core/pull/6619) +- feat(scheduler): health probes [`#6756`](https://github.com/SeldonIO/seldon-core/pull/6756) +- fix(envoy): corrupt envoy yaml and no ALPN config [`#6763`](https://github.com/SeldonIO/seldon-core/pull/6763) +- gRPC graceful shutdown [`#6760`](https://github.com/SeldonIO/seldon-core/pull/6760) +- feat(model-gw): health probes [`#6745`](https://github.com/SeldonIO/seldon-core/pull/6745) +- feat(Scheduler): Deal with model replicas being set to 0 [`#6557`](https://github.com/SeldonIO/seldon-core/pull/6557) +- fix(agent): enable gRPC keep-alive [`#6621`](https://github.com/SeldonIO/seldon-core/pull/6621) +- feat(pipeline-gw): health probes [`#6728`](https://github.com/SeldonIO/seldon-core/pull/6728) +- fix(scheduler): race conditions [`#6747`](https://github.com/SeldonIO/seldon-core/pull/6747) +- feat(dataflow): pipeline parallel loading [`#6746`](https://github.com/SeldonIO/seldon-core/pull/6746) +- chore(tests): enable race detector [`#6614`](https://github.com/SeldonIO/seldon-core/pull/6614) +- fix(agent): model availability during inference pod deletion [`#6636`](https://github.com/SeldonIO/seldon-core/pull/6636) +- fix(pipeline-gw): re-publish in-flight reqs due to partition revoke [`#6695`](https://github.com/SeldonIO/seldon-core/pull/6695) +- fix(pipeline-gw): failed incoming reqs when partitions not available [`#6690`](https://github.com/SeldonIO/seldon-core/pull/6690) +- ci(lint): lint PR title via bash [`#6691`](https://github.com/SeldonIO/seldon-core/pull/6691) +- fix(Scheduler): No dataflow engines available for terminated pipelines [`#6519`](https://github.com/SeldonIO/seldon-core/pull/6519) +- feat: pipeline loadbalancer [`#6675`](https://github.com/SeldonIO/seldon-core/pull/6675) +- Fix resource allocation link [`#6673`](https://github.com/SeldonIO/seldon-core/pull/6673) +- fix: typo pipeline output [`#6647`](https://github.com/SeldonIO/seldon-core/pull/6647) +- feat: statefulsets to deployments for servers [`#6445`](https://github.com/SeldonIO/seldon-core/pull/6445) +- docs: Update test-installation.md [`#6629`](https://github.com/SeldonIO/seldon-core/pull/6629) +- feat(modelgw): modelgw scalability [`#6538`](https://github.com/SeldonIO/seldon-core/pull/6538) +- feat(pipelinegw): pipelinegw scalability to number of partitions [`#6600`](https://github.com/SeldonIO/seldon-core/pull/6600) +- feat(dataflow): dataflow scalability [`#6498`](https://github.com/SeldonIO/seldon-core/pull/6498) +- ci(Pipeline): Enable Go module caching [`#6618`](https://github.com/SeldonIO/seldon-core/pull/6618) +- Update Changelog [`#6605`](https://github.com/SeldonIO/seldon-core/pull/6605) +- Generating changelog for v2.10.0 [`ac92c10`](https://github.com/SeldonIO/seldon-core/commit/ac92c10934cff89ddce0ae3f38600c5afa6a0f62) +- GitBook: No commit message [`9fe5525`](https://github.com/SeldonIO/seldon-core/commit/9fe5525d1b1525cd5cbb2e9d7a7da9dd9fe57b19) +- Setting version for helm charts [`1745e81`](https://github.com/SeldonIO/seldon-core/commit/1745e810d0a7c0cf5ffc92d7b691f38e9fb39ed0) +- GitBook: No commit message [`a490853`](https://github.com/SeldonIO/seldon-core/commit/a49085343b1fed1793482fa6740f2da4b1d2be10) +- Setting version for yaml manifests [`17cbf19`](https://github.com/SeldonIO/seldon-core/commit/17cbf19146328c780d0fca17db939cff4a07823c) + + +[Changes][v2.10.0] + + ## [v2.9.1](https://github.com/SeldonIO/seldon-core/releases/tag/v2.9.1) - 2025-07-09 ### Overview -Core 2.9.1 is a patch release focused on bugfixes and security. Despite this focus, we also introduce a number of important features, described below: +Core 2.9.1 is a patch release focused on bugfixes and security. We also introduce a number of important features related to cyclic pipelines, usability and cost-effectiveness: ### Bug fixes * Allow Core 2 to work reliably with pod disruption budgets (PDBs) ([#6560](https://github.com/SeldonIO/seldon-core/issues/6560)). Previously, terminating/draining pods remained `Ready: True`, which meant that they were still considered non-disrupted from a PDB perspective. -* Make `model-gateway` request timeouts configurable (previously 10 minutes, new default 2 minutes) ([#6522](https://github.com/SeldonIO/seldon-core/issues/6522)). This is a fix for particular cases when inference requests overwhelm the available inference server replicas, leading to significant latency increases. Previously in such cases, a large backlog of unprocessed entries gathered in kafka model input topics, leading to new requests always timing out (enough requests ahead of them in the queue for timeout at LB-level). The new default is likely too high for most usecases, but is set conservatively so as to not prevent inferences for slow models. However, timeouts can be set matching your own workloads at the `model-gateway` pod level, via the `MODELGATEWAY_WORKER_TIMEOUT_MS` environment variable. -* Fix dataflow-engine edge cases for communicating to the scheduler. In the case of multiple dataflow-engine replicas, or a single restarting replica, the status of a Pipeline was set based on the last received message. Sometimes, due to the way locking was done, the status update from a terminating dataflow-engine replica was processed after updates from the replica started by k8s to replace it. This meant that pipelines would transition into `PipelineTerminated` states in a non-deterministic way. +* Make `model-gateway` request timeouts configurable (previously 10 minutes, new default 2 minutes) ([#6522](https://github.com/SeldonIO/seldon-core/issues/6522)). This is a fix for particular cases when inference requests overload the available inference server replicas, leading to significant latency increases. Previously in such cases, a large backlog of unprocessed entries gathered in kafka model input topics, leading to new requests always timing out (enough requests ahead of them in the queue for timeout at LB-level). The new default is likely too high for most usecases, but is set conservatively so as to not prevent inferences for slow models. However, timeouts can be set matching your own workloads at the `model-gateway` pod level, via the `MODELGATEWAY_WORKER_TIMEOUT_MS` environment variable. +* Fix `dataflow-engine` edge cases for communicating to the scheduler ([#6506](https://github.com/SeldonIO/seldon-core/issues/6506)). In the case of multiple dataflow-engine replicas, or a single restarting replica, the status of a Pipeline was set based on the last received message. Sometimes, due to the way locking was done, the status update from a terminating dataflow-engine replica was processed after updates from the replica started by k8s to replace it. This meant that pipelines would transition into `PipelineTerminated` states in a non-deterministic way. ### New features * Allow pipelines to have cycles, with a bounded number of iterations ([#6413](https://github.com/SeldonIO/seldon-core/issues/6413), [#6480](https://github.com/SeldonIO/seldon-core/issues/6480), [docs](https://docs.seldon.ai/seldon-core-2/user-guide/examples/pipeline-cyclic)). This feature is enabled on a pipeline-by-pipeline basis via a newly added `spec.allowCycles` field in the CR. -* Allow the Core 2 operators to be installed within their own namespaces but manage Core 2 CRs in a list of other namespaces ([#6434](https://github.com/SeldonIO/seldon-core/issues/6434), [docs](https://docs.seldon.ai/seldon-core-2/installation/production-environment) -* Allow end-users to delete the kafka topics associated with a model or pipeline when deleting the model/pipeline ([#6353](https://github.com/SeldonIO/seldon-core/issues/6353), [#6383](https://github.com/SeldonIO/seldon-core/issues/6383)). Care needs to be taken in using this feature because it implies loosing observability with respect to historic inference requests/responses, potentially targeted at models within a pipeline. +* Allow the Core 2 operators to be installed within their own namespaces but manage Core 2 CRs in a list of other namespaces ([#6434](https://github.com/SeldonIO/seldon-core/issues/6434), [docs](https://docs.seldon.ai/seldon-core-2/installation/production-environment)) +* Allow end-users to delete the kafka topics associated with a model or pipeline when deleting the model/pipeline ([#6353](https://github.com/SeldonIO/seldon-core/issues/6353), [#6383](https://github.com/SeldonIO/seldon-core/issues/6383), [docs](https://docs.seldon.ai/seldon-core-2/installation/advanced-configurations/managing-kafka-topics)). Care needs to be taken in using this feature because it implies loosing observability with respect to historic inference requests/responses, potentially targeted at models within a pipeline. ### Docs improvements @@ -27,7 +277,8 @@ Core 2.9.1 is a patch release focused on bugfixes and security. Despite this foc ### Core 2 release images -* Core 2 images published on docker hub now have embedded SBOM attestations. At the moment of the release, security scans show zero CVEs within those images. +* Core 2 images published on docker hub now have embedded SBOM attestations. +* At the moment of the release, security scans show zero CVEs within those images. ### CRD Updates All CRD changes in this release maintain backward compatibility, so clusters with existing CRs can be migrated seamlessly. Please see specific feature docs for changes. @@ -36,6 +287,7 @@ All CRD changes in this release maintain backward compatibility, so clusters wit We recognise the significant contributions to this release from [@RobertSamoilescu](https://github.com/RobertSamoilescu) With contributions from [@RobertSamoilescu](https://github.com/RobertSamoilescu), [@lc525](https://github.com/lc525), [@Rajakavitha1](https://github.com/Rajakavitha1), [@paulb-seldon](https://github.com/paulb-seldon) + ------ ### Changelog @@ -5461,6 +5713,9 @@ Initial release [Changes][v0.1.0] +[v2.10.2]: https://github.com/SeldonIO/seldon-core/compare/v2.10.1...v2.10.2 +[v2.10.1]: https://github.com/SeldonIO/seldon-core/compare/v2.10.0...v2.10.1 +[v2.10.0]: https://github.com/SeldonIO/seldon-core/compare/v2.9.1...v2.10.0 [v2.9.1]: https://github.com/SeldonIO/seldon-core/compare/v2.9.0...v2.9.1 [v2.9.0]: https://github.com/SeldonIO/seldon-core/compare/v2.8.5...v2.9.0 [v2.8.5]: https://github.com/SeldonIO/seldon-core/compare/v2.8.4...v2.8.5 @@ -5535,4 +5790,4 @@ Initial release [v0.1.1]: https://github.com/SeldonIO/seldon-core/compare/v0.1.0...v0.1.1 [v0.1.0]: https://github.com/SeldonIO/seldon-core/tree/v0.1.0 - +