Skip to content

Releases: bentoml/BentoML

BentoML - v1.0.8

01 Nov 00:43
8365375
Compare
Choose a tag to compare

🍱 BentoML v1.0.8 is released with a list of improvement we hope that you’ll find useful.

  • Introduced Bento Client for easy access to the BentoML service over HTTP. Both sync and async calls are supported. See the Bento Client Guide for more details.

    from bentoml.client import Client
    
    client = Client.from_url("http://localhost:3000")
    
    # Sync call
    response = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
    
    # Async call
    response = await client.async_classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
  • Introduced custom metrics support for easy instrumentation of custom metrics over Prometheus. See Metrics Guide for more details.

    # Histogram metric
    inference_duration = bentoml.metrics.Histogram(
        name="inference_duration",
        documentation="Duration of inference",
        labelnames=["nltk_version", "sentiment_cls"],
    )
    
    # Counter metric
    polarity_counter = bentoml.metrics.Counter(
        name="polarity_total",
        documentation="Count total number of analysis by polarity scores",
        labelnames=["polarity"],
    )

    Full Prometheus style syntax is supported for instrumenting custom metrics inside API and Runner definitions.

    # Histogram
    inference_duration.labels(
        nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
    ).observe(time.perf_counter() - start)
    
    # Counter
    polarity_counter.labels(polarity=is_positive).inc()
  • Improved health checking to also cover the status of runners to avoid returning a healthy status before runners are ready.

  • Added SSL/TLS support to gRPC serving.

    bentoml serve-grpc --ssl-certfile=credentials/cert.pem --ssl-keyfile=credentials/key.pem --production --enable-reflection
  • Added channelz support for easy debugging gRPC serving.

  • Allowed nested requirements with the -r syntax.

    # requirements.txt
    -r nested/requirements.txt
    
    pydantic
    Pillow
    fastapi
  • Improved the adaptive batching dispatcher auto-tuning ability to avoid sporadic request failures due to batching in the beginning of the runner lifecycle.

  • Fixed a bug such that runners will raise a TypeError when overloaded. Now an HTTP 503 Service Unavailable will be returned when runner is overloaded.

    File "python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 188, in async_run_method
        return tuple(AutoContainer.from_payload(payload) for payload in payloads)
    TypeError: 'Response' object is not iterable

💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.

🥂 We’d like to thank the community for your continued support and engagement.

BentoML - v1.0.7

03 Oct 00:59
47e884f
Compare
Choose a tag to compare

🍱 BentoML released v1.0.7 as a patch to quickly fix a critical module import issue introduced in v1.0.6. The import error manifests in the import of any modules under io.* or models.*. The following is an example of a typical error message and traceback. Please upgrade to v1.0.7 to address this import issue.

packages/anyio/_backends/_asyncio.py", line 21, in <module>
    from io import IOBase
ImportError: cannot import name 'IOBase' from 'bentoml.io'

What's Changed

New Contributors

Full Changelog: v1.0.6...v1.0.7

BentoML - v1.0.6

27 Sep 23:11
b3bd5a7
Compare
Choose a tag to compare

🍱 BentoML has just released v1.0.6 featuring the gRPC preview! Without changing a line of code, you can now serve your Bentos as a gRPC service. Similar to serving over HTTP, BentoML gRPC supports all the ML frameworks, observability features, adaptive batching, and more out-of-the-box, simply by calling the serve-grpc CLI command.

> pip install "bentoml[grpc]"
> bentoml serve-grpc iris_classifier:latest --production

⚠️ gRPC is current under preview. The public APIs may undergo incompatible changes in the future patch releases until the official v1.1.0 minor version release.

  • Enhanced access logging format to output Trace and Span IDs in the more standard hex encoding by default.
  • Added request total, duration, and in-progress metrics to Runners, in addition to API Servers.
  • Added support for XGBoost SKLearn models.
  • Added support for restricting image mime types in the Image IO descriptor.

🥂 We’d like to thank our community for their contribution and support.

  • Shout out to @benjamintanweihao for fixing a BentoML CLI bug.
  • Shout out to @lsh918 for mixing a PyTorch framework issue.
  • Shout out to @jeffthebear for enhancing the Pandas DataFrame OpenAPI schema.
  • Shout out to @jiewpeng for adding the support for customizing access logs with Trace and Span ID formats.

What's Changed

New Contributors

Full Changelog: v1.0.5...v1.0.6

BentoML - v1.0.5

30 Aug 20:18
03a2b31
Compare
Choose a tag to compare

🍱 BentoML v1.0.5 is released as a quick fix to a Yatai incompatibility introduced in v1.0.4.

  • The incompatibility manifests in the following error message when deploying a bento on Yatai. Upgrading BentoML to v1.0.5 will resolve the issue.
    Error while finding module specification for 'bentoml._internal.server.cli.api_server' (ModuleNotFoundError: No module named 'bentoml._internal.server.cli')
  • The incompatibility resides in all Yatai versions prior to v1.0.0-alpha.*. Alternatively, upgrading Yatai to v1.0.0-alpha.* will also restore the compatibility with bentos built in v1.0.4.

BentoML - v1.0.4

26 Aug 18:10
6370972
Compare
Choose a tag to compare

🍱 BentoML v1.0.4 is here!

  • Added support for explicit GPU mapping for runners. In addition to specifying the number of GPU devices allocated to a runner, we can map a list of device IDs directly to a runner through configuration.

    runners:
      iris_clf_1:
        resources:
          nvidia.com/gpu: [2, 4] # Map device 2 and 4 to iris_clf_1 runner
      iris_clf_2:
        resources:
          nvidia.com/gpu: [1, 3] # Map device 1 and 3 to iris_clf_2 runner
  • Added SSL support for API server through both CLI and configuration.

      --ssl-certfile TEXT          SSL certificate file
      --ssl-keyfile TEXT           SSL key file
      --ssl-keyfile-password TEXT  SSL keyfile password
      --ssl-version INTEGER        SSL version to use (see stdlib 'ssl' module)
      --ssl-cert-reqs INTEGER      Whether client certificate is required (see stdlib 'ssl' module)
      --ssl-ca-certs TEXT          CA certificates file
      --ssl-ciphers TEXT           Ciphers to use (see stdlib 'ssl' module)
  • Added adaptive batching size histogram metrics, BENTOML_{runner}_{method}_adaptive_batch_size_bucket, for observability of batching mechanism details.

    image

  • Added support OpenTelemetry OTLP exporter for tracing and configures the OpenTelemetry resource automatically if user has not explicitly configured it through environment variables. Upgraded OpenTelemetry python packages to version 0.33b0.

    image

  • Added support for saving external_modules alongside with models in the save_model API. Saving external Python modules is useful for models with external dependencies, such as tokenizers, preprocessors, and configurations.

  • Enhanced Swagger UI to include additional documentation and helper links.

    image

💡 We continue to update the documentation on every release to help our users unlock the full power of BentoML.

  • Checkout the adaptive batching documentation on how to leverage batching to improve inference latency and efficiency.
  • Checkout the runner configuration documentation on how to customize resource allocation for runners at run time.

🙌 We continue to receive great engagement and support from the BentoML community.

  • Shout out to @sptowey for their contribution on adding SSL support.
  • Shout out to @dbuades for their contribution on adding the OTLP exporter.
  • Shout out to @tweeklab for their contribution on fixing a bug on import_model in the MLflow framework.

What's Changed

New Contributors

Full Changelog: v1.0.3...v1.0.4

BentoML - v1.0.3

08 Aug 18:56
3cc662c
Compare
Choose a tag to compare

🍱 BentoML v1.0.3 release has brought a list of performance and feature improvement.

  • Improved Runner IO performance by enhancing the underlying serialization and deserialization, especially in models with large input and output sizes. Our image input benchmark showed a 100% throughput improvement.

    • v1.0.2 🐌
      image

    • v1.0.3 💨
      image

  • Added support for specifying URLs to exclude from tracing.

  • Added support custom components for OpenAPI generation.
    image

🙌 We continue to receive great engagement and support from the BentoML community.

  • Shout out to Ben Kessler for helping benchmarking performance.
  • Shout out to Jiew Peng Lim for adding the support for configuring URLs to exclude from tracing.
  • Shout out to Susana Bouchardet for add the support for JSON IO Descriptor to return empty response body.
  • Thanks to Keming and mplk for contributing their first PRs in BentoML.

What's Changed

New Contributors

Full Changelog: v1.0.2...v1.0.3

BentoML - v1.0.2

29 Jul 21:17
bf9d519
Compare
Choose a tag to compare

🍱 We have just released BentoML v1.0.2 with a number of features and bug fixes requested by the community.

  • Added support for custom model versions, e.g. bentoml.tensorflow.save_model("model_name:1.2.4", model).
  • Fixed PyTorch Runner payload serialization issue due to tensor not on CPU.
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first
  • Fixed Transformers GPU device assignment due to kwargs handling.
  • Fixed excessive Runner thread spawning issue under high load.
  • Fixed PyTorch Runner inference error due to saving tensor during inference mode.
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
  • Fixed Keras Runner error when the input has only a single element.
  • Deprecated the validate_json option in JSON IO descriptor and recommended specifying validation logic natively in the Pydantic model.

🎨 We added an examples directory and in it you will find interesting sample projects demonstrating various applications of BentoML. We welcome your contribution if you have a project idea and would like to share with the community.

💡 We continue to update the documentation on every release to help our users unlock the full power of BentoML.

What's Changed

  • chore: remove all --pre from documentation by @aarnphm in #2738
  • chore(framework): onnx guide minor improvements by @larme in #2744
  • fix(framework): fix how pytorch DataContainer convert GPU tensor by @larme in #2739
  • doc: add missing variable by @robsonpeixoto in #2752
  • chore(deps): cattrs>=22.1.0 in setup.cfg by @sugatoray in #2758
  • fix(transformers): kwargs and migrate to framework tests by @ssheng in #2761
  • chore: add type hint for run and async_run by @aarnphm in #2760
  • docs: fix typo in SECURITY.md by @parano in #2766
  • chore: use pypa/build as PEP517 backend by @aarnphm in #2680
  • chore(e2e): capture log output by @aarnphm in #2767
  • chore: more robust prometheus directory ensuring by @bojiang in #2526
  • doc(framework): add scikit-learn section to ONNX documentation by @larme in #2764
  • chore: clean up dependencies by @sauyon in #2769
  • docs: misc docs reorganize and cleanups by @parano in #2768
  • fix(io descriptors): finish removing init_http_response by @sauyon in #2774
  • chore: fix typo by @aarnphm in #2776
  • feat(model): allow custom model versions by @sauyon in #2775
  • chore: add watchfiles as bentoml dependency by @aarnphm in #2777
  • doc(framework): keras guide by @larme in #2741
  • docs: Update service schema and validation by @ssheng in #2778
  • doc(frameworks): fix pip package syntax by @larme in #2782
  • fix(runner): thread limiter doesn't take effect by @bojiang in #2781
  • feat: add additional env var configuring num of threads in Runner by @parano in #2786
  • fix(templates): sharing variables at template level by @aarnphm in #2796
  • bug: fix JSON io_descriptor validate_json option by @parano in #2803
  • chore: improve error message when failed importing user service code by @parano in #2806
  • chore: automatic cache action version update and remove stale bot by @aarnphm in #2798
  • chore(deps): bump actions/checkout from 2 to 3 by @dependabot in #2810
  • chore(deps): bump codecov/codecov-action from 2 to 3 by @dependabot in #2811
  • chore(deps): bump github/codeql-action from 1 to 2 by @dependabot in #2813
  • chore(deps): bump actions/cache from 2 to 3 by @dependabot in #2812
  • chore(deps): bump actions/setup-python from 2 to 4 by @dependabot in #2814
  • fix(datacontainer): pytorch to_payload should disable gradient by @aarnphm in #2821
  • fix(framework): fix keras single input edge case by @larme in #2822
  • fix(framework): keras GPU handling by @larme in #2824
  • docs: update custom bentoserver guide by @parano in #2809
  • fix(runner): bind limiter to runner_ref instead by @bojiang in #2826
  • fix(pytorch): inference_mode context is thead local by @bojiang in #2828
  • fix: address multiple tags for containerize by @aarnphm in #2797
  • chore: Add gallery projects under examples by @ssheng in #2833
  • chore: running formatter on examples folder by @aarnphm in #2834
  • docs: update security auth middleware by @g0nz4rth in #2835
  • fix(io_descriptor): DataFrame columns check by @alizia in #2836
  • fix: examples directory structure by @ssheng in #2839
  • revert: "fix: address multiple tags for containerize (#2797)" by @ssheng in #2840

New Contributors

Full Changelog: v1.0.0...v1.0.1

BentoML - v1.0.0

13 Jul 09:05
58aa69b
Compare
Choose a tag to compare

🍱 The wait is over. BentoML has officially released v1.0.0. We are excited to share with you the notable features improvements.

  • Introduced BentoML Runner, an abstraction for parallel model inference. It allows the compute intensive model inference step to scale separately from the transformation and business logic. The Runner is easily instantiated and invoked, but behind the scenes, BentoML is optimizing for micro-batching and fanning out inference if needed. Here’s a simple example of instantiating a Runner. Learn more about using runners.
  • Redesigned how models are saved, moved, and loaded with BentoML. We introduced new primitives which allow users to call a save_model() method which saves the model in the most optimal way based the recommended practices of the ML framework. The model is then stored in a flexible local repository where users can use “import” and “export” functionality to push and pull “finalized” models from remote locations like S3. Bentos can be built locally or remotely with these models. Once built, Yatai or bentoctl can easily deploy to the cloud service of your choice. Learn more about preparing models and building bentos.
  • Enhanced micro-batching capability with the new runner abstraction, batching is even more powerful. When incoming data is spread to different transformation processes, the runner will fan in inferences when inference is invoked. Multiple inputs will be batched into a single inference call. Most ML frameworks implement some form of vectorization which improves performance for multiple inputs at once. Our adaptive batching not only batches inputs as they are received, but also regresses the time of the last several groups of inputs in order to optimize the batch size and latency windows.
  • Improved reproducibility of the model by recording and locking the dependent library versions. We use the versions to package the correct dependencies so that the environment in which the model runs in production is identical to the environment it was trained in. All direct and transitive dependencies are recorded and deployed with the model when running in production. In our 1.0 version we now support Conda as well as several different ways to customize your pip packages when “building your Bento”. Learn more about building bentos.
  • Simplified Docker image creation during containerization to generate the right image for you depending on the features that you’ve decided to implement in your service. For example, if your runner specifies that it can run on a GPU, we will automatically choose the right Nvidia docker image as a base when containerizing your service. If needed, we also provide the flexibility to customize your docker image as well. Learn more about containerization.
  • Improved input and output validation with native type validation rules. Numpy and Pandas DataFrame can specify a static shape or even dynamically infer schema by providing sample data. The Pydantic schema that is produced per endpoint also integrates with our Swagger UI so that each endpoint is better documented for sharing. Learn more about service APIs and IO Descriptors.

⚠️ BentoML v1.0.0 is backward incompatible with v0.13.1. If you wish to stay on the v0.13.1 LTS version, please lock the dependency with bentoml==0.13.1. We have also prepared a migration guide from v0.13.1 to v1.0.0 to help with your project migration. We are committed to supporting the v0.13-LTS versions with critical bug fixes and security patches.

🎉 After years of seeing hundreds of model serving use cases, we are proud to present the official release of BentoML 1.0. We could not have done it without the growth and support of our community.

BentoML - 1.0.0-rc3

01 Jul 23:30
3272fbc
Compare
Choose a tag to compare
BentoML - 1.0.0-rc3 Pre-release
Pre-release

We have just released BentoML 1.0.0rc3 with a number of highly anticipated features and improvements. Check it out with the following command!

$ pip install -U bentoml --pre

⚠️ BentoML will release the official 1.0.0 version next week and remove the need to use --pre tag to install BentoML versions after 1.0.0. If you wish to stay on the 0.13.1 LTS version, please lock the dependency with bentoml==0.13.1.

  • Added support for framework runners in the following ML frameworks.
  • Added support for Huggingface Transformers custom pipelines.
  • Fixed a logging issue causing the api_server and runners to not generate error logs.
  • Optimized Tensorflow inference procedure.
  • Improved resource request configuration for runners.
    • Resource request can be now configured in the BentoML configuration. If unspecified, runners will be scheduled to best utilized the available system resources.

      runners:
        resources:
          cpu: 8.0
          nvidia.com/gpu: 4.0
    • Updated the API for custom runners to declare the types of supported resources.

      import bentoml
      
      class MyRunnable(bentoml.Runnable):
      	SUPPORTS_CPU_MULTI_THREADING = True  # Deprecated SUPPORT_CPU_MULTI_THREADING
              SUPPORTED_RESOURCES = ("nvidia.com/gpu", "cpu")  # Deprecated SUPPORT_NVIDIA_GPU
              ...
      
      my_runner = bentoml.Runner(
          MyRunnable,
          runnable_init_params={"foo": foo, "bar": bar},
          name="custom_runner_name",
          ...
      )
    • Deprecated the API for specifying resources from the framework to_runner() and custom Runner APIs. For better flexibility at runtime, it is recommended to specifying resources through configuration.

What's Changed

New Contributors

Full Changelog: v1.0.0-rc2...v1.0.0-rc3

BentoML - 1.0.0-rc2

22 Jun 06:30
ce7e9d2
Compare
Choose a tag to compare
BentoML - 1.0.0-rc2 Pre-release
Pre-release

We have just released BentoML 1.0.0rc2 with an exciting lineup of improvements. Check it out with the following command!

$ pip install -U bentoml --pre
  • Standardized logging configuration and improved logging performance.

    • If imported as a library, BentoML will no longer configure logging explicitly and will respect the logging configuration of the importing Python process. To customize BentoML logging as a library, configurations can be added for the bentoml logger.
    formatters:
      ...
    handlers:
      ...
    loggers:
      ...
      bentoml:
        handlers: [...]
        level: INFO
        ...
    • If started as a server, BentoML will continue to configure logging format and output to stdout at INFO level. All third party libraries will be configured to log at the WARNING level.
  • Added LightGBM framework support.

  • Updated model and bento creation timestamps CLI display to use the local timezone for better use experience, while timestamps in metadata will remain in the UTC timezone.

  • Improved the reliability of bento build with advanced options including base_image and dockerfile_template.

Beside all the exciting product work, we also started a blog at modelserving.com sharing our learnings gained from building BentoML and supporting the MLOps community. Checkout our latest blog [Breaking up with Flask & FastAPI: Why ML model serving requires a specialized framework] (share your thoughts with us on our LinkedIn post.

Lastly, a big shoutout to @mike Kuhlen for adding the LightGBM framework support. 🥂

What's Changed

New Contributors

Full Changelog: v1.0.0-rc1...v1.0.0rc2