Image Language Models and `ImageGeneration` task #1060

plaguss · 2024-11-14T11:58:31Z

Description

This PR adds a new module to models: models/image_generation to store image models (InferenceEndpointsImageGeneration and OpenAIImageGeneration), with 2 new base classes: ImageGenerationModel and AsyncImageGenerationModel, and a new ImageGeneration task.

Sample pipeline and dataset. Take into account the distiset.transform_columns_to_image method, necessary to push the dataset with the images as objects instead of strings.

from datasets import load_dataset

from distilabel.models.image_generation import InferenceEndpointsImageGeneration
from distilabel.pipeline import Pipeline
from distilabel.steps import KeepColumns
from distilabel.steps.tasks import ImageGeneration

ds = load_dataset("dvilasuero/finepersonas-v0.1-tiny", split="train").select(range(3))

with Pipeline(name="image_generation_pipeline") as pipeline:
    igm = InferenceEndpointsImageGeneration(model_id="black-forest-labs/FLUX.1-schnell")

    img_generation = ImageGeneration(
        name="flux_schnell", image_generation_model=igm, input_mappings={"prompt": "persona"}
    )

    keep_columns = KeepColumns(columns=["persona", "model_name", "image"])

    img_generation >> keep_columns


if __name__ == "__main__":
    distiset = pipeline.run(use_cache=False, dataset=ds)
    # Save the images as `PIL.Image.Image`
    distiset = distiset.transform_columns_to_image("image")
    distiset.push_to_hub("plaguss/test-finepersonas-v0.1-tiny-flux-schnell")

github-actions · 2024-11-14T11:59:58Z

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1060/

codspeed-hq · 2024-11-14T12:07:07Z

CodSpeed Performance Report

Merging #1060 will not alter performance

_{Comparing vision-language-models (7debafd) with develop (e866345)}

Summary

✅ 1 untouched benchmarks

docs/sections/how_to_guides/advanced/distiset.md

pyproject.toml

gabrielmbmb

Very cool! I think we need to fix some issues related to inheritance, but maybe we can tackle those in a separate PRs before the release.

docs/api/models/image_generation/index.md

docs/api/task/image_task.md

docs/sections/how_to_guides/advanced/distiset.md

gabrielmbmb · 2025-01-13T09:38:59Z

src/distilabel/models/image_generation/base.py

+    def get_runtime_parameters_info(self) -> list["RuntimeParameterInfo"]:
+        """Gets the information of the runtime parameters of the `ImageGenerationModel` such as the name
+        and the description. This function is meant to include the information of the runtime
+        parameters in the serialized data of the `ImageGenerationModel`.
+
+        Returns:
+            A list containing the information for each runtime parameter of the `ImageGenerationModel`.
+        """
+        runtime_parameters_info = super().get_runtime_parameters_info()
+
+        generation_kwargs_info = next(
+            (
+                runtime_parameter_info
+                for runtime_parameter_info in runtime_parameters_info
+                if runtime_parameter_info["name"] == "generation_kwargs"
+            ),
+            None,
+        )
+
+        # If `generation_kwargs` attribute is present, we need to include the `generate`
+        # method arguments as the information for this attribute.
+        if generation_kwargs_info:
+            generate_docstring_args = self.generate_parsed_docstring["args"]
+            generation_kwargs_info["keys"] = []
+            # TODO: This doesn't happen with LLM, but with ImageGenerationModel the optional key is not found
+            # in a pipeline, due to some bug. For the moment this does the job. It may be
+            # related to the InferenceEndpointsImageGeneration for example being both
+            # ImageGenerationModel and InferenceEdnpointsLLM, but cannot find the point that makes the
+            # error appear.
+            if "optional" not in generation_kwargs_info:
+                return runtime_parameters_info
+            for key, value in generation_kwargs_info["optional"].items():
+                info = {"name": key, "optional": value}
+                if description := generate_docstring_args.get(key):
+                    info["description"] = description
+                generation_kwargs_info["keys"].append(info)
+
+            generation_kwargs_info.pop("optional")
+
+        return runtime_parameters_info


The issue here is that when calling super().get_runtime_parameters_info(), the InferenceEndpointsLLM.get_runtime_parameters_info method is being called instead of the one from RuntimeParametersMixin class, which is returning the dictionary without the optional key (already popped). I guess similar it's happening with OpenAI class.

I think the inheritance from InferenceEndpointsLLM is a bit messy, not only in this method but also in other parts of the class. For example:

from distilabel.models import InferenceEndpointsImageGeneration, InferenceEndpointsLLM igm = InferenceEndpointsImageGeneration()

This should raise a ValidationError because of the validator only_one_of_model_id_endpoint_name_or_base_url_provided (no endpoint name or model id), but instead gives TypeError:

TypeError: ValidationError.__new__() missing 1 required positional argument: 'line_errors'

which is caused by the multiple inheritance I think.

I think we should:

Move extended methods from RuntimeParametersMixins like runtime_parameters_names, get_runtime_parameters_info and generate_parsed_docstring to another class that then we can use in the base LLM, ImageGenerationModel and the future base classes to come.

Just duplicate the code for classes that uses a client like inference endpoints or openai, because inheriting from the LLM class it's messy or can get messy. Maybe we can create a base class that offers the client functionality for openai, inference endpoints, etc and then use it in the LLMs, ImageGenerationModel, etc.

for more information, see https://pre-commit.ci

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

…ionality

…fferent types of client based models

review-notebook-app · 2025-01-15T09:37:06Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

gabrielmbmb

LGTM!

src/distilabel/mixins/runtime_parameters.py

src/distilabel/models/image_generation/base.py

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

plaguss added 12 commits November 13, 2024 12:44

Add PIL for image processing

1f5e271

Add module to store vision language models

4fa1c10

First version of text-to-image with inference endpoints

b2d858a

Add text-to-image with OpenAI

4733c7d

Add image generation task

6164b3c

Redirect imports

b201baf

Redirect imports

88b8d51

Add image-generation icon

06db4e2

Add vision language models

6fe997a

Move vlms to ilms to make it the name more explicit

18cd75b

Update vlms to ilms

e8cfac5

Add image language models to the components gallery

727d5aa

plaguss added the enhancement New feature or request label Nov 14, 2024

plaguss added this to the 1.5.0 milestone Nov 14, 2024

plaguss self-assigned this Nov 14, 2024

plaguss requested a review from gabrielmbmb November 14, 2024 11:58

plaguss added 5 commits November 14, 2024 16:59

Refactor ILM and fix image saves when save_images=False

0c97eeb

Add example

0aaec8c

Update task to work saving images as JPEG artifact and raw base64 string

00af941

Add short tutorial example for image generation

5629464

Update examples with correct output format

ffed25e

plaguss marked this pull request as ready for review November 15, 2024 08:24

plaguss requested a review from dvsrepo November 15, 2024 11:51

plaguss added 5 commits November 18, 2024 06:30

Refactor ilm to image_generation

1745fd8

Add tests for openai image generation

943d922

Add base ImageGenerationModel classes to improve maintainability

40446f2

Add tests for inference endpoints

43964f7

Fix class names and types

0761894

plaguss added 8 commits November 19, 2024 15:03

Replace with image_to_str function

aa6f9f5

Move import

94360f6

Fix MRO in class inheritance

10538ff

Fix example script

e98b307

Fix optional key not found in runtime parameters

8974ff0

Update examples and simplify process method

341622d

Add ImageTask to the API reference

c46fdc5

Update image task docs

e9e6790

davidberenstein1957 reviewed Dec 19, 2024

View reviewed changes

docs/sections/how_to_guides/advanced/distiset.md Show resolved Hide resolved

pyproject.toml Show resolved Hide resolved

Some types ignores

ffb3c3b

gabrielmbmb reviewed Jan 13, 2025

View reviewed changes

plaguss and others added 12 commits January 13, 2025 14:59

Merge and fix conflict

99467be

[pre-commit.ci] auto fixes from pre-commit.com hooks

e2173e4

for more information, see https://pre-commit.ci

Update docs/api/models/image_generation/index.md

f50aca1

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

Update docs/api/task/image_task.md

8033275

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

Update docs/sections/how_to_guides/advanced/distiset.md

62e2c85

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

Create a new runtime parameters mixin specific to model related funct…

e5442e3

…ionality

typing module refactor to keep all type related info on its own module

29b03bc

Create new base client module to store common functionality across di…

00e9d83

…fferent types of client based models

BIG refactor due to typing

a04c2f2

Refactor tests

70ce54b

Missing refactor

4d7945d

Refactor typing in docs

d15e02b

plaguss requested a review from gabrielmbmb January 15, 2025 09:50

gabrielmbmb approved these changes Jan 15, 2025

View reviewed changes

src/distilabel/mixins/runtime_parameters.py Outdated Show resolved Hide resolved

src/distilabel/models/image_generation/base.py Outdated Show resolved Hide resolved

plaguss and others added 2 commits January 15, 2025 12:10

Update src/distilabel/models/image_generation/base.py

ec2d765

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

Remove unused check for optional parameters in runtime parameters

7debafd

plaguss merged commit 5257600 into develop Jan 15, 2025
8 checks passed

plaguss deleted the vision-language-models branch January 15, 2025 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Language Models and `ImageGeneration` task #1060

Image Language Models and `ImageGeneration` task #1060

plaguss commented Nov 14, 2024 •

edited

Loading

github-actions bot commented Nov 14, 2024

codspeed-hq bot commented Nov 14, 2024 •

edited

Loading

gabrielmbmb left a comment

gabrielmbmb Jan 13, 2025

gabrielmbmb Jan 13, 2025

gabrielmbmb Jan 13, 2025

review-notebook-app bot commented Jan 15, 2025

gabrielmbmb left a comment

Image Language Models and ImageGeneration task #1060

Image Language Models and ImageGeneration task #1060

Conversation

plaguss commented Nov 14, 2024 • edited Loading

Description

github-actions bot commented Nov 14, 2024

codspeed-hq bot commented Nov 14, 2024 • edited Loading

CodSpeed Performance Report

Merging #1060 will not alter performance

Summary

gabrielmbmb left a comment

Choose a reason for hiding this comment

gabrielmbmb Jan 13, 2025

Choose a reason for hiding this comment

gabrielmbmb Jan 13, 2025

Choose a reason for hiding this comment

gabrielmbmb Jan 13, 2025

Choose a reason for hiding this comment

review-notebook-app bot commented Jan 15, 2025

gabrielmbmb left a comment

Choose a reason for hiding this comment

Image Language Models and `ImageGeneration` task #1060

Image Language Models and `ImageGeneration` task #1060

plaguss commented Nov 14, 2024 •

edited

Loading

codspeed-hq bot commented Nov 14, 2024 •

edited

Loading