Skip to content

Release v0.10.0#540

Merged
sharabiani merged 25 commits intomainfrom
release-0.10
Jul 18, 2025
Merged

Release v0.10.0#540
sharabiani merged 25 commits intomainfrom
release-0.10

Conversation

@sharabiani
Copy link
Contributor

Fixes / Features

  • Release v0.10.0 changes

Testing / Documentation

Testing details.

  • [ y/n ] Tests pass
  • [ y/n ] Appropriate changes to documentation are included in the PR

SujeethJinesh and others added 24 commits June 16, 2025 23:02
* use tcpx decorator in A3 High workloads

* remove dead code from storage.py

* fix linting

* change from PR comments
Reliable placement of pathways-head pods across workloads.
Provided the required permissions for JAX to list the pods
* env vars become a dictionary and values overrided

* removed excesive arg

* imported missing modules

* added missing arg

* removed excesive imports

* fixed imports

* fixed dict merge
* fix issue #491

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
* Update Kueue and Jobset controller default limit value

* Update cluster.py

* Split into get and update manifest

* Remove dup lines

* Organize code

* Redesign the feature

* Clean up code

* Correct wrong description

* Remove unnecessary section of yaml

* Resolve lint issue

* Reformat the change
* update ct version

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
* bump ct version to 1.57.1 in docker_manager.py

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
* update kueue version to 0.12.2

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
* fix yaml formatting for workloads with TPU and NAP

* refactor tpu system characteristics

* fix tests

internal representation of TPU machines has changed, so now grep that used the old format fails

---------

Co-authored-by: pawloch00 <ppawl@google.com>
* Managed Lustre storage attach support implemented
* Fix cluster creation from reservation

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
* DWS flex queued support for GPUs and TPUs

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
pawloch00
pawloch00 previously approved these changes Jul 18, 2025
@sharabiani sharabiani enabled auto-merge July 18, 2025 10:12
BluValor
BluValor previously approved these changes Jul 18, 2025
@sharabiani sharabiani disabled auto-merge July 18, 2025 11:27
* fix max-nodes when creating tpu dws flex queued nodepools

Signed-off-by: Piotr Pawłowski <ppawl@google.com>

---------

Signed-off-by: Piotr Pawłowski <ppawl@google.com>
@sharabiani sharabiani dismissed stale reviews from BluValor and pawloch00 via a8b9fa0 July 18, 2025 12:46
@sharabiani sharabiani enabled auto-merge July 18, 2025 13:31
@sharabiani sharabiani merged commit 9b1f2a9 into main Jul 18, 2025
17 of 18 checks passed
@sharabiani sharabiani deleted the release-0.10 branch July 18, 2025 13:39
@sharabiani sharabiani restored the release-0.10 branch July 18, 2025 13:45
@scaliby scaliby deleted the release-0.10 branch November 4, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants