BentoML - v1.0.8
🍱 BentoML v1.0.8
is released with a list of improvement we hope that you’ll find useful.
-
Introduced Bento Client for easy access to the BentoML service over HTTP. Both sync and async calls are supported. See the Bento Client Guide for more details.
from bentoml.client import Client client = Client.from_url("http://localhost:3000") # Sync call response = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]])) # Async call response = await client.async_classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
-
Introduced custom metrics support for easy instrumentation of custom metrics over Prometheus. See Metrics Guide for more details.
# Histogram metric inference_duration = bentoml.metrics.Histogram( name="inference_duration", documentation="Duration of inference", labelnames=["nltk_version", "sentiment_cls"], ) # Counter metric polarity_counter = bentoml.metrics.Counter( name="polarity_total", documentation="Count total number of analysis by polarity scores", labelnames=["polarity"], )
Full Prometheus style syntax is supported for instrumenting custom metrics inside API and Runner definitions.
# Histogram inference_duration.labels( nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__ ).observe(time.perf_counter() - start) # Counter polarity_counter.labels(polarity=is_positive).inc()
-
Improved health checking to also cover the status of runners to avoid returning a healthy status before runners are ready.
-
Added SSL/TLS support to gRPC serving.
bentoml serve-grpc --ssl-certfile=credentials/cert.pem --ssl-keyfile=credentials/key.pem --production --enable-reflection
-
Added channelz support for easy debugging gRPC serving.
-
Allowed nested requirements with the
-r
syntax.# requirements.txt -r nested/requirements.txt pydantic Pillow fastapi
-
Improved the adaptive batching dispatcher auto-tuning ability to avoid sporadic request failures due to batching in the beginning of the runner lifecycle.
-
Fixed a bug such that runners will raise a
TypeError
when overloaded. Now anHTTP 503 Service Unavailable
will be returned when runner is overloaded.File "python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 188, in async_run_method return tuple(AutoContainer.from_payload(payload) for payload in payloads) TypeError: 'Response' object is not iterable
💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.
- Check out the updated PyTorch Framework Guide on how to use
external_modules
to save classes or utility functions required by the model. - See the Metrics Guide on how to add custom metrics to your API and custom Runners.
- Learn more about how to use the Bento Client to call your BentoML service with Python easily.
- Check out the latest blog post on why model serving over gRPC matters to data scientists.
🥂 We’d like to thank the community for your continued support and engagement.
- Shout out to @judahrand for multiple contributions to BentoML and bentoctl.
- Shout out to @phildamore-phdata, @quandollar, @2JooYeon, and @fortunto2 for their first contribution to BentoML.