Skip to content

add managed environment POC #3021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jan 6, 2025
Merged

Conversation

granthamtaylor
Copy link
Contributor

@granthamtaylor granthamtaylor commented Dec 26, 2024

Why are the changes needed?

Task environments can quickly become unwieldy for large, complex codebases. There are well over a dozen commonly used configurations, from container_image to secret_requests that will be similar among many tasks, albeit arbitrarily different for some edge cases.

It is challenging to manage such environment configurations that contain a large number of such configurations while still being able to uniquely override the configurations for individual tasks, and extend a set of configurations to define entirely new environments.

This PR accomplishes both of these critical features in an intuitive manner.

What changes were proposed in this pull request?

These PR contributes a Environment class. An Environment contains a dictionary of configurations. These configurations are applied to flytekit.task at registration time.

However, the configurations of an Environment may be defined during the creation of the Environment, or during the creation of the task to be authored with an `Environment.

As a non-limiting example:

lite = Environment(
    container_image=fl.ImageSpec(builder="union", requirements="requirements.txt"),
    requests=fl.Resources(mem="2Gi"),
    retries=3,
    cache=True,
    cache_version="v0.0.3",
    secret_requests=[fl.Secret(key="WANDB_API_KEY")],
    environment={"PYTHONUNBUFFERED": "1"},
)

@lite.task
def my_task(...):
    # this will include all of the configurations defined in `lite`
    ...

@lite.task(retries=0)
def my_other_task(...):
    # this will include all of the configurations defined in `lite`, but with `retries` overwritten to 0
    ...

Additionally, one may create deep copies of an Environment

processor = lite.extend(
    requests=fl.Resources(cpu="8", mem="16Gi"),
    cache_serialize=True,
)

@processor.task
def my_big_task(...):
    # this will include all of the configurations defined in `processor `
    ...

@processor.task(retries=0)
def my_other_big_task(...):
    # this will include all of the configurations defined in `processor `, but with `retries` overwritten to 0

This allows for an organization to modularly define reusable environments once for an entire project, or, perhaps even define reusable environments for the entire organization.

How was this patch tested?

Tests to be added. I have personally been using this pattern for the last year with Flyte, and with KFP for over three years.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This PR enhances Flytekit with multiple features: (1) introduces a new Environment class for managing task configurations with inheritance capabilities and deep copying, (2) adds VLLM model serving support and improved file handling capabilities, (3) implements dynamic workflow task support and remote path handling, while (4) improving test coverage across StructuredDataset attributes, FlyteFile handling, and configuration inheritance.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 5

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

Attention: Patch coverage is 95.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.71%. Comparing base (bc0e8c0) to head (44a6e64).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
...-inference/flytekitplugins/inference/vllm/serve.py 91.17% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3021       +/-   ##
===========================================
+ Coverage   51.35%   91.71%   +40.35%     
===========================================
  Files         204      121       -83     
  Lines       21446     5368    -16078     
  Branches     2729        0     -2729     
===========================================
- Hits        11014     4923     -6091     
+ Misses       9834      445     -9389     
+ Partials      598        0      -598     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Code Review Agent Run #763fc1

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 481576f..6761d6b
    • flytekit/core/environments.py
    • tests/flytekit/unit/experimental/environments.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
New Feature - Task Environment Management System

environment.py - Implements new Environment class for managing task configurations

__init__.py - Adds Environment class import to main package

Testing - Environment Feature Test Coverage

test_environment.py - Adds unit tests for basic Environment functionality, inheritance, and updates

test_environment.py - Adds integration tests for Environment with Spark configurations

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 30, 2024

Code Review Agent Run #089365

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 6761d6b..543ea4f
    • flytekit/__init__.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks very much for putting this in. quick question though - would you mind adding some a couple tests that cover using Environment with a config? Like could I make a SparkEnvironment with this? And flipping it, could I override an Environment with a Spark config for instance, back to a regular python task? (not saying we should, but could we document the behavior with a unit test?)

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 6, 2025

Code Review Agent Run #0f10d5

Actionable Suggestions - 2
  • plugins/flytekit-spark/tests/test_environment.py - 1
    • Consider autouse for spark session fixture · Line 9-9
  • tests/flytekit/unit/core/test_environment.py - 1
    • Consider verifying original environment state · Line 51-52
Review Details
  • Files reviewed - 3 · Commit Range: 543ea4f..71b4c66
    • flytekit/core/environment.py
    • plugins/flytekit-spark/tests/test_environment.py
    • tests/flytekit/unit/core/test_environment.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@granthamtaylor
Copy link
Contributor Author

thanks very much for putting this in. quick question though - would you mind adding some a couple tests that cover using Environment with a config? Like could I make a SparkEnvironment with this? And flipping it, could I override an Environment with a Spark config for instance, back to a regular python task? (not saying we should, but could we document the behavior with a unit test?)

It won't support SparkEnvironment for now. I mean, I can create a clone of it for Spark / Dask / Ray plugins down the road though. I have always found these be really rough to author effectively.

For now, it just supports ordinary PythonFunctionTask configurations.

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 6, 2025

Code Review Agent Run #5d6d02

Actionable Suggestions - 1
  • plugins/flytekit-spark/tests/test_environment.py - 1
    • Consider adding Spark session cleanup · Line 6-6
Review Details
  • Files reviewed - 1 · Commit Range: 71b4c66..44c9395
    • plugins/flytekit-spark/tests/test_environment.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

from flytekit.core.environment import Environment


def test_spark_task():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding Spark session cleanup

The test may need cleanup of Spark session resources. Consider adding session cleanup using pyspark.sql.SparkSession.builder.getOrCreate().stop() before and after test execution to prevent resource leaks.

Code suggestion
Check the AI-generated fix before applying
 @@ -6,6 +6,9 @@
 +    # Clean up any existing sessions
 +    pyspark.sql.SparkSession.builder.getOrCreate().stop()
 +
      env = Environment(
          task_config=Spark(
              spark_conf={"spark": "1"},
 @@ -20,4 +23,7 @@
          return 10
 
      assert my_spark.task_config is not None
 -    assert my_spark.task_config.spark_conf == {"spark": "1"}
 +    assert my_spark.task_config.spark_conf == {"spark": "1"}
 +
 +    # Clean up after test
 +    pyspark.sql.SparkSession.builder.getOrCreate().stop()

Code Review Run #5d6d02


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged


def test_basic_environment():

env = Environment(retries=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support env.dynamic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a dynamic workflow from an Environment ? That is a great question. Not currently. I can get that added though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah super easy to add.

@pingsutw
Copy link
Member

pingsutw commented Jan 6, 2025

FAILED tests/flytekit/unit/bin/test_python_entrypoint.py::test_get_container_error_timestamp - assert datetime.datetime(2025, 1, 6, 18, 40, 21, 647603) >= datetime.datetime(2025, 1, 6, 18, 40, 21, 647604)

The unit test failure is not related to your PR. You can fix it by rebasing the PR

granthamtaylor and others added 8 commits January 6, 2025 16:22
* Store protos in local cache (#3022)

* Store proto obj instead of model Literal in local cache

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Remove unused file

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Bump aiohttp from 3.9.5 to 3.10.11 (#3018)

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.5 to 3.10.11.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.5...v3.10.11)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix bug in FlyteDirectory.listdir on local files (#2926)

* Fix issue in FlyteDirectory.listdir

Fixes flyteorg/flyte#6005

Signed-off-by: Pim de Haan <pim@cusp.ai>

* Added test

Signed-off-by: Pim de Haan <pim@cusp.ai>

* Run make lint

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Pim de Haan <pim@cusp.ai>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Fix unit tests in airflow plugin (#3024)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* fix: Fix resource meta typos for async agent (#3023)

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* fix: format commands output (#3026)

* Fix pydantic basemodel default input (#3013)

* Fix pydantic default input

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add pydantic integration test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Use duck typing by Thomas's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* lint

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* [BUG] Open FlyteFile from remote path (#2991)

* fix: Open FlyteFile from remote path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Add integration test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Use ctx as param instead of recreation

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Clean test logic

1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Setup local path and downloader in constructor

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Move SimpleFileTransfer to an utility file

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant env var setup

Please refer to #3001

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* test: Add another ff use case

Create ff in one task pod and read it in another task pod.

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* vllm inference plugin (#2967)

* vllm inference plugin

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

* fixed default value

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

---------

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

* Add poetry to image spec (#3025)

* Add poetry to image spec

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Add stricter check

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

---------

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

* [test] Add integration test for accessing sd sttr in dc (#2969)

* test: Add integration test for attr access of sd

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Correct file path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* test: Support interaction with minio s3 bucket

1. Upload a local parquet file to minio s3 bucket
2. Access StructuredDataset attr from a dataclass
3. Open StructuredDataset from a remote path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Delete an unmerged integration test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Try imagespec with commit sha of corresponding fix

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove default_factory and create sd dc from input uri

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Clean test logic

1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant minio env var setup and add test comments

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Support uploading tmp pqt file

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Udpate deprecated module

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant and unused imports

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Pim de Haan <pim@cusp.ai>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pim de Haan <pimdehaan@gmail.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Co-authored-by: V <0426vincent@gmail.com>
Co-authored-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Daniel Sola <40698988+dansola@users.noreply.github.com>
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 6, 2025

Code Review Agent Run #7823f9

Actionable Suggestions - 0
Additional Suggestions - 10
  • tests/flytekit/integration/remote/utils.py - 1
    • Consider configurable test data path · Line 89-89
  • flytekit/types/file/file.py - 1
  • flytekit/image_spec/default_builder.py - 1
  • plugins/flytekit-inference/flytekitplugins/inference/vllm/serve.py - 1
    • Consider consolidating validation checks · Line 44-47
  • tests/flytekit/unit/types/directory/test_listdir.py - 2
    • Consider using temporary directory context manager · Line 9-10
    • Add error handling for directory operations · Line 21-22
  • tests/flytekit/unit/core/image_spec/test_default_builder.py - 2
  • flytekit/clis/sdk_in_container/run.py - 1
    • Consider consolidating JSON serialization logic · Line 478-486
  • flytekit/core/environment.py - 1
    • Consider adding type hints for overrides · Line 97-97
Review Details
  • Files reviewed - 25 · Commit Range: 44c9395..dbe0a64
    • dev-requirements.txt
    • flytekit/clis/sdk_in_container/run.py
    • flytekit/core/array_node_map_task.py
    • flytekit/core/environment.py
    • flytekit/core/local_cache.py
    • flytekit/extend/backend/agent_service.py
    • flytekit/extend/backend/base_agent.py
    • flytekit/image_spec/default_builder.py
    • flytekit/models/literals.py
    • flytekit/types/directory/types.py
    • flytekit/types/file/file.py
    • plugins/flytekit-airflow/setup.py
    • plugins/flytekit-inference/flytekitplugins/inference/__init__.py
    • plugins/flytekit-inference/flytekitplugins/inference/vllm/serve.py
    • plugins/flytekit-inference/setup.py
    • plugins/flytekit-inference/tests/test_vllm.py
    • tests/flytekit/integration/remote/test_remote.py
    • tests/flytekit/integration/remote/utils.py
    • tests/flytekit/integration/remote/workflows/basic/attr_access_sd.py
    • tests/flytekit/integration/remote/workflows/basic/flytefile.py
    • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py
    • tests/flytekit/unit/core/image_spec/test_default_builder.py
    • tests/flytekit/unit/core/test_environment.py
    • tests/flytekit/unit/core/test_local_cache.py
    • tests/flytekit/unit/types/directory/test_listdir.py
  • Files skipped - 1
    • plugins/flytekit-inference/README.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

pingsutw
pingsutw previously approved these changes Jan 6, 2025
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you. Love this feature! It’s going to eliminate a lot of boilerplate code.

Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
@wild-endeavor wild-endeavor enabled auto-merge (squash) January 6, 2025 23:23
@wild-endeavor wild-endeavor merged commit f634d53 into master Jan 6, 2025
102 checks passed
@granthamtaylor granthamtaylor deleted the grantham/add-managed-environments branch January 6, 2025 23:44
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 7, 2025

Code Review Agent Run Status

  • Limitations and other issues: ❌ Failure - Bito Code Review Agent couldn't review this pull request because it is no longer valid. It may have been merged, or the source/target branch may not exist.

shuyingliang pushed a commit to shuyingliang/flytekit that referenced this pull request Jan 11, 2025
Signed-off-by: Grantham Taylor <granthamtaylor@icloud.com>
Signed-off-by: Shuying Liang <shuying.liang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants