Skip to content

Commit

Permalink
[Docs] MessagePack IDL, Pydantic Support, and Attribute Access (#6022)
Browse files Browse the repository at this point in the history
* [Docs] MessagePack IDL, Pydantic Support and Attribute Access

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* support

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* lint

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* lint

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/dataclass.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* nit

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* nit

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/dataclass.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/dataclass.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* Update docs/user_guide/data_types_and_io/pydantic_basemodel.md

Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>

* format

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>
Co-authored-by: David Espejo <82604841+davidmirror-ops@users.noreply.github.com>
  • Loading branch information
Future-Outlier and davidmirror-ops authored Nov 21, 2024
1 parent 09a6fb8 commit e13babb
Show file tree
Hide file tree
Showing 4 changed files with 132 additions and 8 deletions.
16 changes: 10 additions & 6 deletions docs/user_guide/data_types_and_io/accessing_attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho
Consequently, accessing attributes in this manner is, in fact, a specially implemented feature.
This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures.

```{important}
Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```
Expand All @@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 1-10
:lines: 1-9
```

## List
Expand All @@ -31,38 +35,38 @@ Flyte currently does not support output promise access through list slicing.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 14-23
:lines: 13-22
```

## Dictionary
Access the output dictionary by specifying the key.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 27-35
:lines: 26-34
```

## Data class
Directly access an attribute of a dataclass.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 39-53
:lines: 38-51
```

## Complex type
Combinations of list, dict and dataclass also work effectively.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 57-80
:lines: 55-78
```

You can run all the workflows locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 84-88
:lines: 82-86
```

## Failure scenario
Expand Down
18 changes: 17 additions & 1 deletion docs/user_guide/data_types_and_io/dataclass.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,24 @@ When you've multiple values that you want to send across Flyte entities, you can
Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro)
to serialize and deserialize dataclasses.

With the 1.14 release, `flytekit` adopted `MessagePack` as the
serialization format for dataclasses, overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype, like the previous versions do:

to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.

:::{important}
If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
:::

:::{important}
If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
Flytekit version < v1.14.0 will produce protobuf `struct` literal for dataclasses.

Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.

If you're using Flytekit version >= v1.14.0 and you want to produce protobuf `struct` literal for dataclasses, you can
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
Expand Down
3 changes: 2 additions & 1 deletion docs/user_guide/data_types_and_io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Here's a breakdown of these mappings:
- Use ``pyspark.DataFrame`` as a type hint.
* - ``pydantic.BaseModel``
- ``Map``
- To utilize the type, install the ``flytekitplugins-pydantic`` plugin.
- To utilize the type, install the ``pydantic>2`` module.
- Use ``pydantic.BaseModel`` as a type hint.
* - ``torch.Tensor`` / ``torch.nn.Module``
- File
Expand Down Expand Up @@ -144,6 +144,7 @@ flytefile
flytedirectory
structureddataset
dataclass
pydantic_basemodel
accessing_attributes
pytorch_type
enum_type
Expand Down
103 changes: 103 additions & 0 deletions docs/user_guide/data_types_and_io/pydantic_basemodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
(pydantic_basemodel)=

# Pydantic BaseModel

```{eval-rst}
.. tags:: Basic
```

`flytekit` version >=1.14 supports natively the `JSON` format that Pydantic `BaseModel` produces, enhancing the
interoperability of Pydantic BaseModels with the Flyte type system.

:::{important}
Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0.
:::

With the 1.14 release, `flytekit` adopted `MessagePack` as the serialization format for Pydantic `BaseModel`,
overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype like the previous versions do:

to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.

:::{important}
By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing, preserving the types defined in your `BaseModel` class.
If you're serializing `BaseModel` using `flytekit` version >= v1.14.0 and you want to produce Protobuf `struct` literal instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

To begin, import the necessary dependencies:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 1-9
```

Build your custom image with ImageSpec:
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 11-14
```

## Python types
We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: Datum
```

You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON.

:::{note}
All variables in a data class should be **annotated with their type**. Failure to do should will result in an error.
:::

Once declared, a dataclass can be returned as an output or accepted as an input.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 26-41
```

## Flyte types
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 45-86
```

A data class supports the usage of data associated with Python types, data classes,
flyte file, flyte directory and structured dataset.

We define a workflow that calls the tasks created above.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: basemodel_wf
```

You can run the workflow locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 99-100
```

To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \
basemodel_wf --x 1 --y 2
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/

0 comments on commit e13babb

Please sign in to comment.