Dockerize runtime #59

alex-jw-brooks · 2024-06-24T13:14:57Z

This pull request:

Adds a docker file for a caikit computer vision runtime (based on caikit nlp)
Adds definitions for text to image task
Adds a stub module for SDXL

Add SDXL stub Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

gabe-l-hart

A couple of thoughts/comments. Depending on the goal of this PR, I think we should at least change the name of the SDXLStub to be TTIStub or something similar

gabe-l-hart · 2024-06-24T15:49:54Z

Dockerfile

+COPY tox.ini .
+COPY caikit_computer_vision caikit_computer_vision
+# .git is required for setuptools-scm get the version
+RUN --mount=source=.git,target=.git,type=bind \


I think this is just for wheel building, but do we need tests in there too?

Good question! I think yes, but I would like to not block this PR / the SDXL PR on it if it's alright with you, since I'm about to be OOO for a while, and I would prefer to give people an image built using stuff off of main if possible 😄

gabe-l-hart · 2024-06-24T15:51:48Z

Dockerfile

@@ -0,0 +1,51 @@
+FROM registry.access.redhat.com/ubi9/ubi-minimal:latest as base


NIT: It would be nice to have a top-level comment in this Dockerfile. It looks like it's aimed at running the runtime and not for testing/dev, right?

Yup! It's just adding a Dockerfile for building a runtime that software can test with 😄 it's probably a good idea to actually build a release wheel + test out with a container built from this though!

gabe-l-hart · 2024-06-24T15:52:55Z

Dockerfile

+
+RUN microdnf update -y && \
+    microdnf install -y \
+        python3-devel gcc git python-pip && \


If this is for runtime execution, I think we should try to avoid having all of these dev-centric packages in base (as opposed to build). The only exception would be if we are using pytorch.compile that requires the gcc executables at runtime.

Also, do you know which version of python 3 this gets you? I don't off the top of my head. If I'm not mistaken, the use of virtualenv below will still use the system python (as opposed to installing a specific version like you would with conda), right? Just trying to look for sources of bloat/redundancy.

Good questions - I just checked the build container, it looks like 3.9.18. I admittedly pretty much stole this Dockerfile from caikit nlp and did not optimize it at all - the virtual env stuff, and also stages are from that.

We do not pyTorch.compile anything though - I think putting the stuff that isn't python into build sounds good

Actually, I have temporary disabled (kind of) the things that need gcc and git, so I'm just going to remove them for now - in the future, I'll reenable it in the build stage.

As far as the python venv / system python, I checked that also! Inside of the virtual environment in the container:

lrwxrwxrwx 1 root root 15 Jun 25 20:05 python -> /usr/bin/python lrwxrwxrwx 1 root root 6 Jun 25 20:05 python3 -> python lrwxrwxrwx 1 root root 6 Jun 25 20:05 python3.9 -> python

So yes, it looks like it is pointing at the system version, which in this base image is 3.9

gabe-l-hart · 2024-06-24T15:53:56Z

Dockerfile

+
+FROM base as deploy
+
+RUN python -m venv --upgrade-deps /opt/caikit/


The use of a virtualenv is interesting here. On the one hand, it's a nice isolation mechanism, but on the other hand it will also create some duplication with the base python runtime.

gabe-l-hart · 2024-06-24T15:55:40Z

caikit_computer_vision/data_model/__init__.py

@@ -18,3 +18,4 @@
 from .image_classification import *
 from .image_segmentation import *
 from .object_detection import *
+from .text_to_image import *


Ooh, nice! I have a few things for this that might be worth contributing

gabe-l-hart · 2024-06-24T15:58:44Z

caikit_computer_vision/data_model/text_to_image.py

+
+
+@dataobject(package="caikit_data_model.caikit_computer_vision")
+class TextToImageResult(DataObjectBase):


One thing I found with my purpose-built text-to-image module was that it was helpful to bind the input text to the image in the output object. I ended up calling it a CaptionedImage and making it inherit from Image:

@dataobject class CaptionedImage(Image): """A Captioned image has a caption as well as the image itself""" caption: Optional[str] # TODO: Use type hints here once caikit supports them # https://github.com/caikit/caikit/issues/608 # def __init__(self, *args, caption: Optional[str] = None, **kwargs): def __init__(self, *args, caption = None, **kwargs): """Explicitly delegate to Image's initializer so that dataobject does not auto-create an __init__ """ super().__init__(*args, **kwargs) self.caption = caption

I like this more than the current output type! Currently things are written the way they are because of the way software expects image to be formatted, which is basically wrapping encoded bytes of a compressed image. I greatly prefer this also though! For now, I would like to continue with this output format if you're alright with it, since this is what they have been testing with also, but I think it would be a good idea to add this to caikit and update the result to use this in the future.

I suspect this is the route that they will want to go too, since they had already started talking about returning a JSON object with stuff + image instead of just the encoded image 🤞

I guess I'm a little confused on how this structure accomplishes the goal of having the output be just encoded bytes. I would expect that you would still need to call some function on the output field to get those bytes encoded, right? If this is returned from a task in caikit.runtime, it would return as a json blob or serialized proto with output and producer_id fields unless there's something in the core Image that would provide custom serialization?

Yup! That is in the data model for image, the image data model holds an export format, which by default is png. when you get the attribute of image data on the image backend, it makes a BytesIO object and exports it with PIL here!

gabe-l-hart · 2024-06-24T15:59:38Z

caikit_computer_vision/data_model/tasks.py

@@ -61,3 +62,14 @@ class ImageSegmentationTask(TaskBase):
    Note that at the moment, this task encapsulates all segmentation types,
    I.e., instance, object, semantic, etc...
    """
+
+
+@task(


Looks very similar to my personal definition of this (with the change of CaptionedImage, see below)

@task( unary_parameters={"text": dm.TextDocument}, unary_output_type=dm.CaptionedImage, ) class TextToImageTask(TaskBase): """Task of generating an image from text"""

caikit_computer_vision/modules/text_to_image/sdxl_stub.py

gabe-l-hart · 2024-06-24T16:07:55Z

caikit_computer_vision/modules/text_to_image/sdxl_stub.py

+        with saver:
+            saver.update_config({"model_name": self.model_name})
+
+    def run(self, inputs: str, height: int, width: int) -> TextToImageResult:


Ah, ok, so this seems to be just a stub and not actually have anything to do with sdxl, right? I think we should change the name unless you plan to update it to actually support running sdxl. As written, I think this is more of a unit test stub, right?

Ahh yes, sorry! This is a little disorganized - it is just a stub + Dockerfile in this one. I am still splitting some stuff apart a bit, but I wrote this and pushed an image with it so that software could test wiring more easily.

There is an SDXL block too, but it's in a separate PR that is on top of this one that I am rebasing at the moment (here)

Ok, that makes sense. I still think this should be named tti_stub since we eventually imagine multiple modules for tti and this is not sdxl specific

Agreed! Renamed

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

gabe-l-hart

Ship it (once you've got the rename of CaptionedImage done 😉)

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks requested review from gkumbhat and gabe-l-hart as code owners June 24, 2024 13:14

alex-jw-brooks force-pushed the dockerize_runtime branch 4 times, most recently from 8ee351d to 147c94a Compare June 24, 2024 13:48

alex-jw-brooks added 5 commits June 24, 2024 08:59

Add text to image task and output types

5f2f53b

Add SDXL stub Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add text to image stub tests

df520ff

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add preliminary caikit runtime dockerfile

31e4e2d

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Add build to tox

ef15b7d

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Fix text to image dm module docstring

f0ae7c0

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks force-pushed the dockerize_runtime branch from 9dacb91 to a92e637 Compare June 24, 2024 14:59

fmt sdxl tests

a1b1b80

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks force-pushed the dockerize_runtime branch from a92e637 to a1b1b80 Compare June 24, 2024 15:08

rename sdxl stub module file

af72a26

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

gabe-l-hart reviewed Jun 24, 2024

View reviewed changes

alex-jw-brooks added 2 commits June 25, 2024 13:40

Rename sdxl stub to tti stub

f682251

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

remove gcc and git from base stage install

335eaef

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

gabe-l-hart approved these changes Jun 25, 2024

View reviewed changes

Add caption, rename output type

5aec367

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

alex-jw-brooks merged commit a329759 into main Jun 26, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerize runtime #59

Dockerize runtime #59

alex-jw-brooks commented Jun 24, 2024 •

edited

Loading

gabe-l-hart left a comment

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 24, 2024 •

edited

Loading

gabe-l-hart Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 25, 2024 •

edited

Loading

alex-jw-brooks Jun 25, 2024

gabe-l-hart Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 25, 2024

gabe-l-hart Jun 24, 2024

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 24, 2024 •

edited

Loading

gabe-l-hart Jun 24, 2024

alex-jw-brooks Jun 25, 2024

gabe-l-hart left a comment

		@@ -0,0 +1,51 @@
		FROM registry.access.redhat.com/ubi9/ubi-minimal:latest as base


		FROM base as deploy

		RUN python -m venv --upgrade-deps /opt/caikit/



		@dataobject(package="caikit_data_model.caikit_computer_vision")
		class TextToImageResult(DataObjectBase):

Dockerize runtime #59

Dockerize runtime #59

Conversation

alex-jw-brooks commented Jun 24, 2024 • edited Loading

gabe-l-hart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-jw-brooks Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-jw-brooks Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex-jw-brooks Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabe-l-hart left a comment

Choose a reason for hiding this comment

alex-jw-brooks commented Jun 24, 2024 •

edited

Loading

alex-jw-brooks Jun 24, 2024 •

edited

Loading

alex-jw-brooks Jun 25, 2024 •

edited

Loading

alex-jw-brooks Jun 24, 2024 •

edited

Loading