0.18.1
Changelog
- 9284a3a chore: bump version: 0.18.1-rc7 -> 0.18.1
- eb09c34 docs: add release notes for 0.18.1 (#4216)
- bad9a33 chore: bump version: 0.18.1-rc6 -> 0.18.1-rc7
- 6cff1b0 fix: use bigint for checkpoint size in
proto_get_trials_plus
(#4208) - 60291e9 chore: bump version: 0.18.1-rc5 -> 0.18.1-rc6
- 8f4a797 perf: tweak proto_get_trials_plus plan (#4206)
- 414bcd2 chore: bump version: 0.18.1-rc4 -> 0.18.1-rc5
- 789b39c fix: allow
internal: null
for pre-0.15.6 experiments (#4197) - fd27bac fix: add restarts back to get_trial_ids for sorting
- 845f2f0 chore: bump version: 0.18.1-rc3 -> 0.18.1-rc4
- eed09e9 docs: update screen shots for cluster UI (#4188)
- 88271dd style: minor theme fixes and style adjustments [DET-7349] (#4161)
- f2a4e5e feat: display trial restarts [DET-7347] (#4160)
- eaf84e6 chore: bump version: 0.18.1-rc2 -> 0.18.1-rc3
- a8ddc82 fix: sync slot usage for k8s [DET-7350] (#4172)
- 1f13710 fix: enable currently active side nav item (#4167)
- e9333d2 perf: fixup query for latest training per trial (#4166) [DET-7352]
- 764ef2d fix: include both old and new checkpoints in total checkpoint size (#4165)
- 4157c82 chore: bump version: 0.18.1-rc1 -> 0.18.1-rc2
- e2f949a chore: bump version: 0.18.1-rc0 -> 0.18.1-rc1
- 26ede20 chore: revert scheduling docs
- 8ea2a52 fix: prevent experiment name in header from flowing entire vertical space of screen during resize (#4157)
- 1e244e0 chore: bump version: 0.18.1-dev0 -> 0.18.1-rc0
- 96e0e58 chore: lock api state for backward compatibility check
- 82f0366 feat: allow NaN validation metrics [DET-7177] (#4150)
- 0bbeec1 feat: upload all tb files DET-7139 (#4155)
- 90b918a fix: adjust upscaling of column widths [DET-7220] (#4138)
- 4b66bf0 feat: rolling upgrades v0 [DET-6548] (#4031)
- beea245 ci: disable most checks on ci-only changes (#4118)
- 5f7e74a fix: upstream test failures due to config being admin protected (#4153)
- da1dcd7 fix: return user data when new user is created [DET-7255] (#4149)
- 1eba7a2 docs: a vain attempt to pass ci test on already approved pr4110 content changes (#4151)
- 7aea015 fix: No redirecting url when model name is changed (#4127)
- 00171f8 feat: Cluster UI improvement [DET-7072, DET-7073] (#4009)
- 27e04e4 feat: require admin privileges for cluster managment [DET-7186] (#4129)
- 09a8ff6 ci: update gke version. (#4147)
- fa3a959 ci: Increase package-and-push-system-local resource class (#4143)
- 0d4fe23 chore: fix boolean urlparams for grpc (#4136)
- b1829b8 feat: enable SLURM preemption (#4114) [FOUNDENG-21]
- b4ef273 chore: add a local docs server (#4117)
- 5dea211 refactor: theme architecture [DET-6211] (#4004)
- aef66b0 build: make docs build incremental and idempotent (#4116)
- ec7007c ci: persist debs and rpms in circleci for dev, rc, and release builds (#4124)
- b89b0a3 fix: user filter on dashboard [DET-7251] (#4132)
- e3e50a9 chore: add job ID and experiment labels to prometheus endpoint mappings [DET-6964] (#4119)
- b0d8a93 ci: make codecov information for sure now. (#4130)
- b313503 chore: restructure shareable webui utils and types (#4112)
- 78505d6 ci: turn off codecov bot PR comments (#4122)
- a91866f chore: fix rank determination for horovod with mpi (#4109)
- 3368a77 fix: NCCL interface in distributed tests (#4111)
- 18aadd0 chore: bump version: 0.18.0-dev0 -> 0.18.1-dev0
- 7500d6f docs: add release notes for 0.18.0 (#4102)
- 1f4a642 chore: bump version: 0.17.16-dev0 -> 0.18.0-dev0
- 59928ec chore: explicit naming of preemption and coscheduler resources [DET-7140] (#4101)
- ac2564b fix: add missing task log teardown for trials (#4107)
- c41e1dc docs: rework quickstart for ml developers (#4091)
- 2c7564d chore: use reported slots available for on prem deployments (#4095)
- 62049ae chore: change codecov to informational only (#4105)
- 5061a8f fix: mark distributed tests as parallel (#4093)
- 87b8e53 chore: enable codecov enforcement (#4084)
- 15dd7e3 fix: bindings sessions in experiment apis. (#4096)
- eb65b09 chore: cleanup and fixes for "det deploy" (#4103)
- d78a712 perf: improve plan for proto_get_trial_plus.sql (#4073)
- a980c56 build: update submodules on webui get-deps (#4082)
- 78d4e8b chore: HAL-2879 Cleanly shutdown all sshd servers on exit (#176) (#4087)
- 940f8f7 chore: Refactor JupyterLabModal pattern [DET-6276] (#4072)
- ec5553a chore: make container proxy support more flexible, for slurm (#3948)
- 3614c83 chore: wait for process substition log filters [DET-6712] (#3930)
- 474742f chore: clean up useCallback dependency (#4092)
- 39588ee feat: add det.LOG_FORMAT constant (#4090)
- bcc50f9 fix: Support rendering rank of 0 (#4083)
- c523523 fix: consistent total slot calculation for cluster overview [DET-7182] (#4080)
- 9c16349 feat: add wrap_rank helper script (#4086)
- ea4a949 fix: dont show archived in column picker [DET-7187] (#4085)
- a4e5f84 feat: authenticate task proxies (#4071)
- 5fab384 fix: wrap torch.distributed launch in pid server/client (#4077)
- e73f063 fix: use displayNames in ClusterHistoricalUsage (#4059)
- 3368dd8 chore: filter out NaN, +/- Infinity metric values for charts for now. (#4076)
- f8b5bf5 chore: add and consolidate code coverage to codecov (#4064)
- 487b04c docs: release note for core api (#4069)
- 15a668b fix: add user column back to experiment list (#4070)
- 880b769 feat: break workload info from trial endpoint into a new endpoint [DET-6729] (#3635)
- d703c96 fix: show notification when delete experiment fail [DET-6811] (#4051)
Docker images
docker pull determinedai/determined-master:0.18.1
docker pull determinedai/determined-master:9284a3aa6
docker pull determinedai/determined-master:9284a3aa6e307c61426c93b5e09730c664725604
docker pull determinedai/determined-dev:determined-master-9284a3aa6
docker pull determinedai/determined-dev:determined-master-9284a3aa6e307c61426c93b5e09730c664725604
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.1
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9284a3aa6
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9284a3aa6e307c61426c93b5e09730c664725604