Releases: determined-ai/determined
Releases · determined-ai/determined
0.19.3
Changelog
- 17f6d80 chore: bump version: 0.19.3-rc4 -> 0.19.3
- ba4c8fb docs: add release notes for 0.19.3 (#4997)
- 620cf69 chore: bump version: 0.19.3-rc3 -> 0.19.3-rc4
- 1cd716a feat: allow specifying Fluent Bit container UID/GID on Kubernetes [DET-8012] (#4963)
- 041cae9 chore: bump version: 0.19.3-rc2 -> 0.19.3-rc3
- a7364af fix: correct overflow action buttons [DET-8322] (#4979)
- 7557b06 chore: bump version: 0.19.3-rc1 -> 0.19.3-rc2
- 83df8cc fix: remove duplicate Admin Guide tile (#4975)
- f7824e4 fix: WebUI config download [DET-8323] (#4974)
- 11fe52c chore: bump version: 0.19.3-rc0 -> 0.19.3-rc1
- 1af55aa chore: revert "chore: secure echo with default authentication [DET-7405] [DET-7378] (#4267)" (#4971)
- dbe7008 fix: reduce settings api calls [DET-8307] (#4970)
- fa2a825 chore: bump version: 0.19.3-dev0 -> 0.19.3-rc0
- 05d713e chore: lock api state for backward compatibility check
- 0127d7d chore: secure echo with default authentication [DET-7405] [DET-7378] (#4267)
- e2512cf feat: adjust scrollbar color by theme (#4964)
- e1971c8 fix: associate allocation sessions with users (#4949)
- 6c0ea87 chore: fix a typo in py generator (#4938)
- a5ea7e8 chore: add question issue (#4959)
- c87cc1f ci(test-unit): remove debug code (#4947)
- fedee52 test: remove ds test from p2 (#4951)
- 4b565ac ci: run deepspeed on g4dn instances (#4946)
- b2765f0 feat: WebUI 404 not found page [DET-8226] (#4937)
- f1f77c6 refactor: AuthZ for trials [DET-8211] (#4940)
- 522f9f3 fix: allow forking an archived experiment [DET-8277] (#4944)
- d346f3f chore: test apex checkpointing [DET-7886] (#4904)
- 6a3a455 chore: ensure isAuthError can see into wrapped exceptions (#4934)
- c94a91c ci(test-unit): accept only status events (#4941)
- 5363d84 docs: slurm jobs do not require gres (#4911)
- 7c12bd2 docs: update required python to 3.7 (#4939)
- acd2ba9 feat: add programatic download for the config files (#4907)
- 8f1f2f0 ci(test-unit): flail productively (#4936)
- bd2db37 chore: address low hanging security updates (#4872)
- bf61b08 fix: remove prevUser constraint (#4932)
- 948f34a feat: WebUI create user with group info [DET-8221] (#4923)
- a57c909 refactor: AuthZ for experiments [DET-8003] (#4905)
- a93903b feat: helm chart: add OIDC and SCIM options [DPS-204] (#4897)
- ab8e471 test: update yaml file names (#4924)
- 798fca6 docs: fix to hyperlink in release notes (#4895)
- 0fa875c docs: Slurm support updates for 0.19.3 (#4919)
- 99c8f3f chore: fix rebase error (#4922)
- e1632c0 chore: add stream argument to Session._do_request (#4902)
- c3b0fb6 fix: rbac-user-groups merge conflicts and lints.
- f923e79 feat: WebUI group list page [DET-7921, DET-7976] (#4724)
- 710f8f6 fix: rbac-user-groups merge conflicts.
- e9a909d feat: WebUI edit user [DET-7846] (#4680)
- e35fb59 chore: RBAC user groups crud (#4620)
- 1933ef3 feat: migrate patch user logic to grpc server [DET-7909] (#4648)
- e9ab25d feat: pluggable authorization for RBAC. (#4626)
- 12cad9f chore: User Groups SQL (#4519) [DET-7803]
- d551eb4 fix: change /var/cache permissions to mode 775 (#4920)
- 0164be0 fix: GetExperiments error on forked experiment (#4918)
- 9b23d6f ci(test-unit): limit runs to only test-e2e updates (#4915)
- 3dc8651 fix: race condition in agent
container
actor around missingcontainerInfo
. (#4869) - b2caa15 ci(test-unit): fix conditional check syntax (#4913)
- bbf27db ci(test-unit): fix debug line to print payload (#4912)
- c9fdcfa ci(link-artifacts): add initial workflow attempt (#4906)
- 0306d66 chore: resource pool support for PBS (#4884)
- 51355e4 perf: improve
getWorkspaceProjects
api for Quick Search (#4896) - 4ad9c1d chore: change import path in generated bindings (#4900)
- fc1aee2 chore: proto build should fail on first error. (#4802)
- 3f68ac2 fix: re-render issue (#4898)
- 0e3c81e feat: GetExperiments to bun (#4813)
- 479beba feat: DeepSpeed CPU offloading (#4875)
- b85c1b3 chore: replace
PropsWithChildren
with explicit children (#4890) - 3f9aacf chore: migrate python sdk to generated bindings [DET-8005] (#4844)
- a3ad849 chore: bump version: 0.19.2-dev0 -> 0.19.3-dev0
- c339e34 docs: add release notes for 0.19.2 (#4877)
- e066d32 chore: set torch_geometric version in example to fix e2e test. (#4889)
- f6580dd perf: set memory cap to improve memory allocation (#4840)
- 25019fa chore: fix limit 0 for /api/v1/trials/:id/workloads (#4886)
- a5c6f79 feat: experiment checkpoint list [DET-8201] [DET-8129] (#4870)
- 95c5126 feat: allow OrderBy in GetExperimentCheckpoints for SortBy SearcherMetric (#4885)
- a5278b1 feat: create quick search to jump to workspace or project (#4837)
- c0b98db build: enable storybook previews (#4874)
- 116baf9 fix: det e describe with multiple trials (#4863)
- cf31c47 ci: fix flakes in test_max_concurrent_trials (#4865)
- 9f5306d chore: test AMP autocast and gradient scaling [DET-7885] (#4702)
- 0f0f82e chore: some cli cleanup (#4859)
- 5e8d8f2 docs: remove misleading redirect (#4883)
- 07e7650 feat: add security.default_task and openshift host options to helm chart [DPS-204] (#4843)
- 30e3393 feat: add disabled prop to ActionDropdown (DET-7937) (#4867)
- 2f0464f fix: downgrade fluentbit to fix tls.vhost issues (#4871)
- 74dd27f build: avoid double testing via e2e-longrunning (#4850)
- f008dcb chore: add controllable logging support [DET-8025] (#4826)
- 4a7c03f fix: remove workloadCount from trial responses; single-trial view fix (#4857)
- 945cd6a chore: document reasons for scaler.update() (#4845)
- 70c0c66 chore: add authz on moving experiments between projects [DET-7750] (#4806)
- 9e132ed fix: remove subprocess import (#4856)
- 6491115 chore: preserve failed action's error message (#4822)
Docker images
docker pull determinedai/determined-master:0.19.3
docker pull determinedai/determined-master:17f6d80b3
docker pull determinedai/determined-master:17f6d80b349011a29f51210a7634806709f99472
docker pull determinedai/determined-dev:determined-master-17f6d80b3
docker pull determinedai/determined-dev:determined-master-17f6d80b349011a29f51210a7634806709f99472
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.19.3
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:17f6d80b3
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:17f6d80b349011a29f51210a7634806709f99472
0.19.2
Changelog
- 8abc3de chore: bump version: 0.19.2-rc2 -> 0.19.2
- 7db3572 docs: add release notes for 0.19.2 (#4877)
- ea7abb8 chore: bump version: 0.19.2-rc1 -> 0.19.2-rc2
- 02831bb fix: downgrade fluentbit to fix tls.vhost issues (#4871)
- 950f1ce chore: bump version: 0.19.2-rc0 -> 0.19.2-rc1
- 78b9c30 fix: remove workloadCount from trial responses; single-trial view fix (#4857)
- d47ee91 chore: bump version: 0.19.2-dev0 -> 0.19.2-rc0
- 584448b fix: job queue experiment restore (#4797)
- ef83374 fix: set gc policy [DET-8018] (#4812)
- 5e4f50b fix: misc view code bug fixes
- 90ca917 chore: TrialContext is not an interface (#4851)
- 7ae3a38 fix: remove ds example (#4852)
- ec7c8e3 chore: fix random/grid searcher bug with max_concurrent_trials (#4836)
- 27ba9cc fix: enable moving jobs around w/o assuming the full set [DET-8015] (#4766)
- 4ca0fd4 fix: rps should correctly ignore other rps job msgs [DET-8214] (#4848)
- 9a9e05f fix: fit long name (#4825)
- d66d235 fix: remove duplicate loading animation (#4839)
- 80a5507 fix: remove non-model-hub mmdet tests (#4846)
- 29c083e fix: pass user ids for this user ids filter (#4842)
- 5735ea3 fix: do not bypass torch.distributed.launch for single-slot trials (#4838)
- 80735bc fix: handle avgMetrics response on individual trials of multi-trial experiment (#4821)
- 83f5a1a fix: correctly display nested categorical hyperparameters [DET-8074] (#4818)
- 7ffdc3d feat: rolling upgrades support for
det deploy aws
[DET-7853]. (#4829) - b161957 fix: begin standardizing API pagination behavior in CLI. (#4833)
- ab0df53 fix: make allocation saves idempotent (#4695)
- ce376e9 build: set shared web to use xlarge resource class (#4824)
- cf8ebde fix: canceling all experiment trials should cancel experiment (#4759)
- 16e0cc1 ci: fix test-cli-win. (#4834)
- 53316c6 chore: rename a file to prevent API breakages (#4831)
- 345a3b1 ci: Fix publish_helm syntax errors [INFENG-1] (#4819)
- dbccd3c ci: add a checkbox to the PR template (#2969)
- 726fe80 chore: promote Session to a first-class citizen (#4787)
- 2cb0c9b test: webui user management unit test [DET-7968] (#4809)
- 0f4bf86 chore: deprecate mmdetection example in favor of model-hub version (#4816)
- 6815ddf perf: reduce user settings api call (#4790)
- 208afc6 fix: use fluent version 1.9.3 everywhere by default. (#4814)
- 78f40f1 chore: UI fix and improvements (#4747)
- 170000d fix: get trial datapoints from trial comparison/summarization endpoint (#4796)
- 5bda3fc fix: always override protobufAny description in openapi spec (#4811)
- 4bd7c16 fix: reconcile metrics proto, move
det trial describe
to the new API. [DET-7617] (#4746) - 69f5dfb chore: fix ListValue types in swagger spec (#4801)
- 3d459fc ci: fix GHA syntax better [INFENG-1]
- 399148b ci: fix GHA workflow boolean syntax [INFENG-1]
- 6e3bb53 ci: Remove unnecessary quotes [INFENG-1] (#4810)
- 17773b9 chore: share copy to clipboard btn (#4799)
- 7805d59 fix: correct job queue table bugs [DET-8069] (#4804)
- bbbde9c fix: hide
Delete
button if user is not a creator or admin (#4805) - b445352 fix: speed up
det deploy aws
stack updates. (#4793) - 7bcb26e style: update number input error style for dark mode (#4772)
- 7c867e7 refactor: authz interface for projects and workspaces [DET-8002] (#4721)
- e96dca9 fix: grid view on Workspaces and Projects pages show all items [DET-8031] (#4794)
- 441d1c6 fix: job queue pagination (#4756)
- 3191b4c chore:
react-router-dom
partial update part1 (#4788) - cde91c8 feat: Async deleting workspaces and projects (from CLI) [DET-7821] (#4675)
- 003ddd8 fix: directly return object-not-found errors instead of rewrapping them (#4791)
- c0794f5 fix: jupyterLab modal poping issue (#4792)
- efe0fe7 fix: use jupyter icon in navigation side bar (#4786)
- 8cf1517 feat: deepspeed cpu offloading example (#4623)
- 55542b7 fix: tab routing issue in resource pool (#4789)
- 7cd0e51 perf: improve too many user api call (#4763)
- 69c2dfa test: Fix cluster utils cluster_slots() API (#4784)
- 56f6469 fix: use correct experiment list offset when deleting an experiment [DET-7880] (#4754)
- 0dd9e2a chore: Use bindings.v1File instead of ContextItem (#4779)
- e7e0ab2 fix: remove core external dependency from shared (#4782)
- ea3f257 feat: view code UI (#4473)
- 550667b chore: disable positional args in bindings.py classes (#4777)
- 0d5f3eb docs: Add release note for Slurm feature (#4778)
- 047b6ba fix: lint-python ci test (#4774)
- cbc0ba1 refactor: reduce unneeded api calls [DET-7451] (#4771)
- ccf20c5 fix: gpt_neox deepspeed example (#4622)
- f927431 chore: update shared tester git url format (#4773)
- e0ac8e8 fix: label filter in model registry (#4769)
- 99878b7 test: add tests for utils/service (#4749)
- 228744e chore: expose Avatar props through AvatarCard (#4765)
- ad7767c chore: upgrade swagger generator from 2.4.14 to 2.4.27 (#4738)
- 9113e7e Fix a couple more helm action typos [INFENG-1]
- e894408 Fix helm workflow typos / indentation [INFENG-1]
- 4e10f82 ci: add helm repo [infeng 1] (#4725)
- 57842f9 chore: bump version: 0.19.1-dev0 -> 0.19.2-dev0
- 1f5b043 docs: add release notes for 0.19.1 (#4768)
- c478c01 fix: tensorboard metrics step count [DET-8028] (#4761)
- d78c8fc ci: re-enable gke shell logs test fixed d74ef5 (#4760)
- 7d55063 chore: Remove obsolete workloads from Trials API (#4703)
- c6579ef refactor: solidify rm interface [DET-7852, DET-7984] (#4705)
- b303b16 fix: allow changing max_length units in HP Search (#4755)
- 7c98baa ci: remove trent from shared codeowners (#4757)
- d74ef5a ci: shells should generate keys, even with empty 'data' field (#4744)
Docker images
docker pull determinedai/determined-master:0.19.2
docker pull determinedai/determined-master:8abc3decd
docker pull determinedai/determined-master:8abc3decdc2c30813dcf674f19d1beb25eeb51e8
docker pull determinedai/determined-dev:determined-master-8abc3decd
docker pull determinedai/determined-dev:determined-master-8abc3decdc2c30813dcf674f19d1beb25eeb51e8
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.19.2
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:8abc3decd
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:8abc3decdc2c30813dcf674f19d1beb25eeb51e8
0.19.1
Changelog
- 7cc6107 chore: bump version: 0.19.1-rc2 -> 0.19.1
- 5feb1dc docs: add release notes for 0.19.1 (#4768)
- 60b06de chore: bump version: 0.19.1-rc1 -> 0.19.1-rc2
- 6797dba fix: tensorboard metrics step count [DET-8028] (#4761)
- b7ef726 chore: bump version: 0.19.1-rc0 -> 0.19.1-rc1
- 1706a0c fix: allow changing max_length units in HP Search (#4755)
- 086942f chore: bump version: 0.19.1-dev0 -> 0.19.1-rc0
- 14cff26 ci: unversion workflows (#4752)
- f080a63 chore: lock api state for backward compatibility check
- f264281 fix: Write change-password script in /tmp instead of CWD (#4677)
- ebc79c2 fix: python sdk can parse master output again (#4745)
- 2b41576 chore: update docker images names (#4727)
- 286194d fix: deepspeedtrial validation batch size computation (#4743)
- ce78067 chore: [Ant Design] replace old menu with new menu (#4741)
- 6de74a5 docs: add release note for fix for searcher early termination bug (#4739)
- bf6fff3 fix: url-encode description of notebook and tensorboard (#4718)
- 9334945 fix: fix an issue with forbidden api actions causing logout (#4737)
- e695763 style: fix mobile exp header [DET-7975] (#4733)
- 73fe5b6 fix: hardcode pathname instead of using
paths
(#4740) - a071fc6 test: test cases for
shared/utils/routes.ts
[DET-7902] (#4706) - e5fa6fd style: remove styling that forced padding to be 0 (#4734)
- d8d7e25 fix: remove the default theme from initialization (#4698)
- 5adbf42 feat: add spinner to show trial fetching (#4683)
- 464f68c test: add tests for experiment detail page [DET-7979] (#4723)
- a09e81d fix: cursor in modal text field jumps to end of input (#4691)
- 3fcb985 chore: add regex in
InlineEditor
[DET-7518] (#4716) - 5ef83ba chore: share sort utilities [DET-7970] (#4711)
- feabd45 fix: breadcrumb text color (#4720)
- bd542a0 fix: resolve issues around
InlineEditor
[DET-7914] (#4713) - 2ea7946 test: add tests for settings page [DET-7966] (#4717)
- 2724b70 chore: Remove unnecessary imports and fields in proto (#4710)
- eef660f fix: WebUI workspace pagination [DET-7927] (#4700)
- aad8011 refactor: authz provider implementation and authz users basic implementation (#4676)
- 2c575fd fix: record operations at the right places around shutting down (#4719)
- 9ab7dae chore: clear selected item when clear filters (#4714)
- b089329 feat: One-Click Hyperparameter Search [DET-7537] [DET-7538] (#4458)
- 02be458 test: add wait utils test coverage [DET-7959] (#4701)
- bd8664a fix: fix low contrast issue for button styles [DET-7958] (#4692)
- 05800e2 test: update path conditionals for gh workflows (#4708)
- 332270a style: fix doc tile styling (dark mode support and responsive) [DET-7955] (#4709)
- c78b152 fix: mark all 4xx api failures as auth failure (#4690)
- 972954d feat: Connect trial UI to workloads API; pass sort/filter to API (#4407)
- b97198b test: add samlauth tests (#4685)
- 2c26b46 docs: add rest api reference link and rewrite rest api doc (#4688)
- c2968d6 docs: port slurm deployment to oss docs (#4653)
- 3623b0c test: WebUI interaction test for page [DET-7894] (#4689)
- 55aa326 refactor: test cases for
ActionDropdown
(#4699) - 938a486 fix: push-shared target's directory change (#4672)
- da5f7fe fix: keyboard doesnt show for inline editor in mobile [DET-7519] (#4659)
- 0016fd7 fix: word break in description (#4697)
- 6a8856c fix: move some libs in package.json (#4687)
- 20d48be chore: support enum sizes for avatar (#4686)
- 84631ab test: add test cases for
string.ts
(#4679) - 8d2a821 test: create interaction tests for action dropdown [DET-7895] (#4684)
- 5c1679d test: add test coverage for shared error utilities [DET-7900] (#4666)
- ed65d20 ci(lint-python): migrate to gha workflow (#4639)
- 4c3e9f2 test: add test cases for
Image.tsx
(#4667) - aedaa58 chore: bump version: 0.19.0-dev0 -> 0.19.1-dev0
- 63b2dac docs: add release notes for 0.19.0 (#4671)
- 4eeaa51 chore: update live docs script for extension change (#4678)
- 53ed638 test: add test cases for
Icon.tsx
(#4664) - 2a115c4 test: add unit tests for logger class [DET-7901] (#4674)
- 20b682a feat: add new PyTorchCallbacks [DET-7760] (#4500)
- a9f4a87 refactor: remove unused code in model version detail page (#4670)
- 725c74f fix: persist task state update in interactive task view (#4662)
- 3115509 feat: Create user UI [DET-7847] (#4665)
- fac341c fix: Count only active tasks in cluster info board (#4658)
- 9651e0d chore: update codecov badge to reflect web only (#4661)
- 69a8668 fix: comment to gen swagger for model def API [DET-7926] (#4657)
- 17f3926 test: utils/set unit tests [DET-7904] (#4655)
- a42bd97 fix: description overflowing table cell (#4656)
- 65ba5d6 test: add test cases for
AvatarCard
(#4650) - ad70ce8 feat: task specific actions to job overflow menu (#4638)
- dc1ecfb ci(lint-bindings): migrate to gha workflow (#4642)
- 6b341f2 ci(lint-go): migrate to gha workflow (#4636)
- cd5cbec docs: fix a typo in docs for Elasticsearch-backed logging (#4228)
Docker images
docker pull determinedai/determined-master:0.19.1
docker pull determinedai/determined-master:7cc610754
docker pull determinedai/determined-master:7cc610754b2f6828240e07cb222a31da71df4f10
docker pull determinedai/determined-dev:determined-master-7cc610754
docker pull determinedai/determined-dev:determined-master-7cc610754b2f6828240e07cb222a31da71df4f10
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.19.1
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:7cc610754
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:7cc610754b2f6828240e07cb222a31da71df4f10
0.19.0
Changelog
- 5497b51 chore: bump version: 0.19.0-rc2 -> 0.19.0
- d4a101c docs: add release notes for 0.19.0 (#4671)
- 35c926f chore: bump version: 0.19.0-rc1 -> 0.19.0-rc2
- 1f0a22e fix: Count only active tasks in cluster info board (#4658)
- 0785b69 chore: bump version: 0.19.0-rc0 -> 0.19.0-rc1
- 86ba506 fix: description overflowing table cell (#4656)
- 90f3c26 chore: bump version: 0.19.0-dev0 -> 0.19.0-rc0
- f296b9e chore: lock api state for backward compatibility check
- 7ae6674 chore: bump version: 0.18.5-dev0 -> 0.19.0-dev0
- 704c36b chore: hide user management for now (#4652)
- c1dc69b ci: long running det_deploy_local stress test to upstream (#4643)
- 74349af fix: change average_training_metrics default to true [DET-7240] (#4646)
- 8016426 fix: update resource pool job queues to paginate properly (#4579)
- bc9ec2b chore: test improvement for _pytorch_context and _pytorch_trial [DET-7761] [DET-7763] (#4494)
- ffc1b34 style: fix avatar card for long names (#4644)
- 5d57eab style: fix disabled primary button style (#4645)
- c980fea fix: model deital bugs (#4640)
- 9032f67 feat: enable open telemetry in agent [DET 7085] (#4276)
- 5625323 chore: install ptl in startup hooks (#4380)
- 69083b9 fix: minor bugs in test config creation (#4515)
- 84a63b6 feat: allow DET_PASS and DET_USER to skip need for login [DET-7025] (#4597)
- c4e32ba style: update account layout [ET-7829] (#4627)
- f8c8c99 fix: agent entrypoint should use
exec
. [DET-7834] (#4641) - ead85bb feat: support pagination in bun (#4634)
- 331376e build: skip det-deploy-local for webui only changes (#4628)
- 2acd123 feat: Show active experiment and task count on resources page [DET-7680] (#4566)
- 8bd2c84 feat: User management list view [DET-7796] (#4607)
- be07453 chore: fix docs/.gitignore to use .rst (#4633)
- de18b88 chore: allow setting context type for useModal (#4612)
- 177ae96 chore: handle new type-pyopenssl pacakage (#4632)
- 2da2ed2 test: fix EditableMetadata test flake (#4610)
- 96a3a15 chore: enhance GitHub issue templates (#4630)
- b5b99f5 ci(lint-docs): migrate to gha workflow (#4621)
- a89d7b3 docs: apply formatting again
- b7d28bb docs: complete switch from
txt
torst
extensions - ad0c129 chore: implement closeBar feature for omnibar (#3306)
- 3358e90 chore: remove getProjectExperiments and merge into getExperiments (#4606)
- 271f5e9 Adding issues template
- 27da5e8 ci: test downstream (#4604)
- b1d3604 fix: reject reconnecting agents with different device configuration [DET-7568] (#4381)
- 4336924 ci(circleci): remove determined-ci context ref (#4613)
- ce0f89d docs: model definition file cache (#4609)
- 6742e1a chore: add attr to track api state has been initialized (#4599)
- e409e71 chore: share useModal hook [DET-7781] (#4571)
- e161839 fix: flaky test caused by compare stats [DET-7838] (#4576)
- a17c043 fix: rocm-smi workaround without product info (#4553)
- 1d87dcf style: fix spacing for tabs and headers [DET-7823] (#4595)
- 8e52a79 docs: shorten landing page tile descriptions (#4605)
- d149a54 fix: Drop rendezvous interface warning (FOUNDENG-139) (#4594)
- 066c11e fix: helm option "defaultPassword" caused the deployment to hang [DET-7814] (#4570)
- 6362a30 ci(lint-react): migrate to gha workflow (#4592)
- 0d0f89b ci(lint-secrets): migrate to gha workflow (#4591)
- a5ca19e feat: text search for task/trial logs [DET-7446] (#4577)
- fb4dba5 chore: add cache to tools/devcluster.yaml (#4584)
- 0e0784e docs: restructure content (#4484)
- db32a64 fix: ignore order for Set isEqual [DET-7822] (#4589)
- 9c79ea8 test: switch hamid with a one line check (#4596)
- 881ef68 chore: update relative import styles to absolute (#4581)
- 10205cd docs: update description of telemetry reporting (#4175)
- 0c60209 feat: add sort and order to users api [DET-7828] (#4573)
- 6d3065d feat: open tasks in embedded task view in CLI [DET-7686] (#4563)
- 58fc5b1 fix: notebook ignores --template CLI param and notebooks still launch if --preview param is set [DET-7632] (#4476)
- d81faa2 fix: zero slot tasks k8s using wrong image and exposing all GPUs [DET-7808] (#4586)
- deef881 ci(test-cli): migrate to gha workflow (#4575)
- 05827e1 ci(test-e2e): move optional jobs into own workflow (#4587)
- 823e322 fix: allow zero slots for JupyterLab in modal (#4582)
- 52c5a00 chore: tweak ilia's CODEOWNERS. (#4580)
- 85531d1 chore: rbac query params [DET-7843] (#4583)
- 7570626 refactor: move user settings to page [DET-7795] (#4550)
- 31fff14 fix: double scroll bar (#4578)
- 3897a50 fix: storybook rendering issue (#4554)
- b1818a4 fix: missing
key
for rows (#4564) - 91811cf refactor: replace antd Space with flexbox gap in SelectFilter (#4569)
- facb234 chore: add docs to codeowners (#4568)
- 46888f0 fix: expconf copy timeout too short causing errors (#4567)
- 9559b99 fix: race condition on experiment config for 'det e create --paused' [DET-7789] (#4533)
- 09dfaf5 chore: eslint keyword spacing (#4561)
- 011ca00 chore: update eslint array multiline and add eslint arrow paren (#4562)
- d5f8977 feat: Place experiment in a project using CLI [DET-7720] (#4552)
- cb92dd6 feat: add dynamic page title to embedded task page [DET-7681] (#4555)
- 48b6c32 refactor: make sso a plugin [DET-7560] (#4559)
- 2bb461e feat: remove reset column widths button [DET-7675] (#4558)
- 5a163d8 chore: bump version: 0.18.4-dev0 -> 0.18.5-dev0
- d109ef4 docs: add release notes for 0.18.4 (#4547)
- f519b39 style: add gofumpt (#4217)
- 002d950 ci(scan-docker-images): migrate to gha (#4546)
- 495f409 ci: switch to custom-built container image (#4535)
- f13c237 perf: improve deepspeed checkpointing for sharedfs (#3905)
- 92bf7c5 feat: api to expose experiment model code [DET-7465] (#4374)
- 00c2909 feat: Web UI request checkpoint deletion [DET-7113] (#4545)
- 9907bc7 fix: increase time to wait for command in priority scheduler test (#4544)
- 2d59d25 chore: remove activemetric duplication [DET-7737] (#4469)
- ff758fd fix: correct allocation end times when cluster heartbeat is before allocation start time (#4556)
- 8ae44d4 fix: hide description placeholder when archived (#4551)
- 1e6da7b fix: rename window title (#4548)
- 2760885 fix: polish cluster page UI [DET-7682] (#4508)
- 85f72ed feat: update allocation bar legend logic [DET-7683] (#4510)
- 5e1bd1a chore: update omnibar with new theme variables (#4528)
- 9045e21 fix: tensorboard redirecting due to missing rp (#4542)
- d0d23be fix: scroll bar overlap (#4538)
- 20ac538 feat: use skeleton component for rendering the table while fetching data (#4462)
- 7ddc8ad chore: agent sends log level with agent added container logs (#4532)
- 937964e chore: temporary-disable-dependabumps (#4541)
- 6004b89 fix: rename lable
restarts
toauto restarts
(#4536) - 633be08 fix: change
short ID
width (#4537) - 1e66cab test: disable flaky test for now (#4530)
- 7a54c9a fix: persist whose workspaces/projects (#4525)
- b673e34 test: update avatar tests to remove warnings (#4523)
- bdfa9fa fix: text color in Hyperparameters page (#4529)
- e9ea762 fix: skip gzip only for proxy web assets [DET-7802] (#4517)
- 82d1144 feat: docker auth improvements [DET-7633, DET-7636] (#4513)
Docker images
docker pull determinedai/determined-master:0.19.0
docker pull determinedai/determined-master:5497b5114
docker pull determinedai/determined-master:5497b5114db5546f1ecaa2349dd6e8c4c3638fd5
docker pull determinedai/determined-dev:determined-master-5497b5114
docker pull determinedai/determined-dev:determined-master-5497b5114db5546f1ecaa2349dd6e8c4c3638fd5
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.19.0
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:5497b5114
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:5497b5114db5546f1ecaa2349dd6e8c4c3638fd5
0.18.4
Changelog
- 710d557 chore: bump version: 0.18.4-rc4 -> 0.18.4
- 990e62b docs: add release notes for 0.18.4 (#4547)
- a12fd45 chore: bump version: 0.18.4-rc3 -> 0.18.4-rc4
- 57e4e7e chore: bump version: 0.18.4-rc2 -> 0.18.4-rc3
- 06dcaac chore: bump version: 0.18.4-rc1 -> 0.18.4-rc2
- 09bf496 fix: correct allocation end times when cluster heartbeat is before allocation start time (#4556)
- 75d5237 chore: bump version: 0.18.4-rc0 -> 0.18.4-rc1
- 47f9207 fix: tensorboard redirecting due to missing rp (#4542)
- 56bf894 chore: bump version: 0.18.4-dev0 -> 0.18.4-rc0
- 900b134 chore: lock api state for backward compatibility check
- a14986b test:
set_priority
should respect the master url. (#4527) - 8a3c07a test: bring back tests for generic commands on master restarts. (#4490)
- 73268ac docs: add ClusterInfo docs (#4496)
- 9a34b3d chore: bumpenvs (#4520)
- b1a2927 fix: ensure correct master url is used for priority scheduler and other managed devcluster tests. (#4512)
- 667e313 fix: deepspeed examples config (#4511)
- 2f6373c style: add uplot cursor styles for dark mode (#4502)
- 7657c15 feat: add props to style avatar differently (#4514)
- 5272c09 test: set up shared web to test with local strategy (#4504)
- 2f09503 chore: bump version: 0.18.3-dev0 -> 0.18.4-dev0
- 5f30b67 docs: add release notes for 0.18.3 (#4491)
- 758c2e9 test: fix local time tests (#4503)
- 2a968bf fix: give a reasonable name to model def download (#4493)
- 7e20b5d docs: minor improvements and Homebrew build instructions (#4489)
- 38fe302 chore: add helpers for working with shared-code (#4452)
- 28c7b89 fix: make actor system
GetOrElseTimeout
actually timeout. (#4499) - 5a41986 feat: environment_variables in task_container_defaults [DET-7638] (#4485)
- 1184bb3 test: fix editable metadata test flake (#4495)
- 7898dc1 style: theme tuning [DET-7362, DET-7363, DET-7364, DET-7365, DET-7366, DET-7383, DET-7425, DET-7498, DET-7551] (#4378)
- cd28b37 chore: fix react fmt (#4487)
- 97d3e60 refactor: update modals to better support contexts [DET-6297] (#4468)
- 855e832 fix: allocation errors on finished hp search [DET-7724] (#4467)
- ce296c5 fix: correct path selector for tensorboard upload (#4474)
- c3d6509 fix: add priority scheduler e2e tests [DET-7106] (#4429)
- 7e50a04 fix: close button should not show in non-embedded log viewer [DET-7723] (#4447)
- 1d61f3a fix: adjust spinner for the InlineEditor (#4367)
- a69813b feat: persist user web setting [DET-7501] (#4394)
- be48f76 feat: support non-root init container k8s [DET-7109] (#4460)
- 2015f14 fix: disable pointer events on log viewer spinner block (#4470)
- db65bf4 test: tweak
test_agent_reconnect_keep_experiment
test timeouts. (#4465) - 5f13407 chore: bump the resources for react tests (#4459)
- 60ec81f fix: sync agent actor init on master restart [DET-7746] (#4463)
- 473f7b6 fix: loadingState cleanup for WorkspaceList (#4461)
- e1fd062 feat: add spinner to tiral/experiment tabs to show data fetching (#4436)
- d123d90 ci: fix autolabeling for shared (#4456)
- 7347bb6 test: update ci job for testing shared-code (#4444)
- 6450aa1 docs: workspaces docs (#4448)
- a1a06c4 fix: trim workspace and project name whitespace [DET-7390] [DET-7747] (#4451)
- 908cdb1 docs: PyTorchTrialCallback.on_validation_end (#4457)
- e914491 ci: add codeowners for shared directory (#4455)
- d28d1c2 chore: share AvatarCard and Avatar comps [DET-7714 DET-7713] (#4430)
- efc5ffb fix: only call move API for permissioned experiments (#4445)
- 62b9a2d fix: map name to name instead of description in getProjectExperiments (#4450)
- 6475a19 fix: mac path for --agent-config-path (#4449)
- df830a1 fix: long tag wrap (#4446)
- 10022e8 fix: various workspaces fixes (#4443)
- 6790aee feat: master config option for additional fluent outputs [DET-7549] (#4415)
- c0ddb8d chore: webui changes for slurm (#4427)
- 4df29fc chore: resolve warnings from yaml and numpy, fix supposed fstrings, black formatting (#4372)
Docker images
docker pull determinedai/determined-master:0.18.4
docker pull determinedai/determined-master:710d5575b
docker pull determinedai/determined-master:710d5575b8565fc50b5e65143b0b27dd661b0d17
docker pull determinedai/determined-dev:determined-master-710d5575b
docker pull determinedai/determined-dev:determined-master-710d5575b8565fc50b5e65143b0b27dd661b0d17
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.4
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:710d5575b
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:710d5575b8565fc50b5e65143b0b27dd661b0d17
0.18.3
Changelog
- 5cf7e8b chore: bump version: 0.18.3-rc8 -> 0.18.3
- 6d19e17 docs: add release notes for 0.18.3 (#4491)
- a7cc461 chore: bump version: 0.18.3-rc7 -> 0.18.3-rc8
- 6dba678 fix: make actor system
GetOrElseTimeout
actually timeout. (#4499) - 250aca7 chore: bump version: 0.18.3-rc6 -> 0.18.3-rc7
- f029111 chore: bump version: 0.18.3-rc5 -> 0.18.3-rc6
- 73fd9f0 fix: allocation errors on finished hp search [DET-7724] (#4467)
- c85186b fix: correct path selector for tensorboard upload (#4474)
- 2193001 feat: support non-root init container k8s [DET-7109] (#4460)
- 40d4cd1 chore: bump version: 0.18.3-rc4 -> 0.18.3-rc5
- d996a96 fix: sync agent actor init on master restart [DET-7746] (#4463)
- 97b5a0c fix: loadingState cleanup for WorkspaceList (#4461)
- e2fb0f8 docs: workspaces docs (#4448)
- d835b1e fix: trim workspace and project name whitespace [DET-7390] [DET-7747] (#4451)
- bcb8b3e chore: bump version: 0.18.3-rc3 -> 0.18.3-rc4
- e34bbe1 fix: map name to name instead of description in getProjectExperiments (#4450)
- 9b86fc3 fix: mac path for --agent-config-path (#4449)
- 0918a71 fix: only call move API for permissioned experiments (#4445)
- 5a8a4c4 chore: bump version: 0.18.3-rc2 -> 0.18.3-rc3
- 054d089 fix: long tag wrap (#4446)
- 29f224e fix: various workspaces fixes (#4443)
- c7d6b68 chore: bump version: 0.18.3-rc1 -> 0.18.3-rc2
- 650cce5 chore: webui changes for slurm (#4427)
- ca975c8 chore: bump version: 0.18.3-rc0 -> 0.18.3-rc1
- 761e6ad feat: master config option for additional fluent outputs [DET-7549] (#4415)
- 759e5d6 chore: bump version: 0.18.3-dev0 -> 0.18.3-rc0
- ca0a7c1 chore: lock api state for backward compatibility check
- 4010b33 feat: experiment comparison page (#4410)
- 030de4a fix: default max length properly to undefined (#4437)
- 6e6420e feat: task container checks for occupied gpus [DET-5091] (#4323)
- e414235 test: resolve
useCustomizeColumnsModal
test flakes [DET-7635] (#4433) - bd57667 chore: update deepspeed launcher to work with newer versions (#4156)
- 4277e99 Split 'webui/react/src/shared/' into commit '3302ddf5d45c5a73cdcbbc750936bfb32e025c06'
- 9fbf9d5 fix: hide continue trial for experiments with 0 or more than 1 trial (#4434)
- 79b9d90 chore: Merge Slurm-infrastructure support changes (#4376)
- 8b799e9 ci: send Docker image scans to the infrastructure engineering channel instead (#4425)
- 7436c6c fix: properly close experiment create modal [DET-7553] (#4421)
- 169c0d8 docs: clarify count requirement for grid search (#4428)
- 47c393f test: tune web tests (#4409)
- 2e8b92b feat: install a SIGUSR1 signal handler to print stacktraces (#4281)
- bff6d1a fix: registry auth in master.yaml (#4412)
- 69d6b1e fix: Sync epoch_idx across all workers for PyTorchTrial [DET-7488] (#4303)
- 3302ddf chore: make core styles and configs shareable [DET-7316] (#4398)
- b8a7afc chore: make core styles and configs shareable [DET-7316] (#4398)
- 182af49 fix: allocation state mismatch with proto [DET-7593] (#4408)
- 2c4a77a chore: don't launch horovod for hypothetical zero slot trials (#4414)
- e1c1eac feat: shm_size can take a string with units like docker run allows (#4314)
- c05ebd7 chore: migrate shared-web to subtree (#4402)
- 0505422 chore: migrate shared-web to subtree (#4402)
- cb1fc53 fix: always flush tensorboard ready log (#4387)
- c82d64c fix: End agent stats log to debug level (#4411)
- b5e2397 fix: fix cluster resource pool selection [DET-7517 DET-7567 DET-7517] (#4391)
- 6672fd8 chore: Merge Slurm-infrastructure support changes trial (#4396)
- 6d0fe6b fix: recover from permission error when deleting preexisting file or link (#4397)
- 1e8bbac fix: move scale import (#4406)
- 624d179 chore: Add nolintlint to deadcode for unused oss method (#4404)
- 8e50499 fix: Do not fail if no checkpoints to gc [FOUNDENG-83] (#4405)
- e3ee40f fix: test agent config flake (#4386)
- b701fcd feat: Trial Metrics - Summary endpoints (#4392)
- c9d9d53 feat: add log scale to experiment viz (#4273)
- 2d82f01 chore: sync node version requirements with package-lock (#4395)
- b2acc5d chore: Update task logging setup [FOUNDENG-81] (#290) (#4389)
- 0ad82fd chore: Merge Slurm-infrastructure support changes task_trial (#4393)
- 9149c1a chore: Merge Slurm-infrastructure support changes job.go (#4388)
- 3d30637 chore: Merge Slurm-infrastructure support changes archive (#4390)
- e9f3b1a feat: rolling upgrades support for generic command types (notebooks etc.) [DET-7218] (#4371)
- b130e3f fix: Add compute driver capabilities to determined-agent Dockerfile (#4385)
- 6d8ea53 chore: Restrict display names [DET-7356] (#4266)
- 72d4bcb fix: update experiment list tag truncate rule [DET-7530] (#4373)
- c78e92e feat: stress test for agent enable/disable and reconnect spam [DET-6733] (#4269)
- e4a275b feat: add --agent-config-path to det deploy local agent-up [DET-6278] (#4366)
- 7f40e86 chore: handle setResourcePool message in commands (#3571)
- 9a1ab33 style: ban
fmt.Println
andfmt.Printf
in go code. (#4369) - cbc37cd chore: upgrade mockery to latest (#4363)
- 76423ba fix: don't GC registered checkpoints [DET-7418] (#4316)
- 81fed29 chore: idempotent container running messages [DET-7555] (#4364)
- 19f311e feat: workspaces and projects [DET-6461] (#4203)
- d257b95 chore: update resource pool cloud icons [DET-7160] (#4317)
- c1da1f2 chore: restrict access to master.yaml and agent.yaml to the user/owner only. (#4299)
- 8d91f0f ci: Add dependabot for all the things (#4313)
- 058cf5c ci: make test-unit-storage run on any upstream branch (#4320)
- 49966ef feat: Tensorboard profiles from all hosts (#4142)
- 762cee1 ci: merge separate codeowners (#4322)
- 5e6ba22 chore: bump version: 0.18.2-dev0 -> 0.18.3-dev0
- bb5ccc1 docs: add release notes for 0.18.2 (#4318)
- 13b6059 ci: create default codeowner file (#4321)
- 0527c48 fix: harness azure dependencies (#4319)
- 869dea1 chore: upgrade to typescript 4.7 [DET-7499] (#4285)
- 9c94d1f ci: use parallelism for
test-e2e-managed-devcluster
. (#4193) - 98e4b0b chore: delete dead code (#4312)
- 3df0fb1 fix: provide unique port offsets to trials (#4315)
- c5eb030 chore: drop pbt, adaptive and adaptive_simple from docs and web (#4311)
- ca57ffc chore: cleanup container terminations resent on agent exit (#4309) [DET-7533]
- 11d17fe chore: add logging for e2e gpu test (#4232)
- b14c281 docs: more consistently use Determined AI vs. Determined (#4310)
- 0a93e76 fix: skip allocation check on checkpoint save to fix pulling preemption (#4308)
- 4610d70 ci: skip e2e_tests for webui
- 2e9bae6 fix: remove limit of length 4 for task id (#4304)
- ee6b118 chore: use vanilla markdown in examples (#4268)
- db5cd90 fix: harness dependency class fore azure (#4302)
- 19aa55c chore: stamp out unexpected messages [DET-7492] (#4284)
- 342236a chore: FOUNDENG-55 For deepspeed, set the NCCL_SOCKET_IFNAME env variable based on dtrain_network_interface (#4297)
- 69ff973 ci: fix log message grepped for by test_launch_layer_cifar (#4301)
- 1299ef3 ci: remove overzealous assert that is a race with a quick noop task (#4296)
- 5c28304 docs: move description of agent client cert fields to the right place (#4293)
Docker images
docker pull determinedai/determined-master:0.18.3
docker pull determinedai/determined-master:5cf7e8b5a
docker pull determinedai/determined-master:5cf7e8b5a6a8393b04c1d54a5363cbbe6e8792d2
docker pull determinedai/determined-dev:determined-master-5cf7e8b5a
docker pull determinedai/determined-dev:determined-master-5cf7e8b5a6a8393b04c1d54a5363cbbe6e8792d2
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.3
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:5cf7e8b5a
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:5cf7e8b5a6a8393b04c1d54a5363cbbe6e8792d2
0.18.2
Changelog
- d214a34 chore: bump version: 0.18.2-rc2 -> 0.18.2
- 10b172b docs: add release notes for 0.18.2 (#4318)
- 38e0efb chore: bump version: 0.18.2-rc1 -> 0.18.2-rc2
- c5cb23f fix: harness azure dependencies (#4319)
- f623c1c chore: bump version: 0.18.2-rc0 -> 0.18.2-rc1
- 350de38 chore: cleanup container terminations resent on agent exit (#4309) [DET-7533]
- f02f11b fix: skip allocation check on checkpoint save to fix pulling preemption (#4308)
- cfc0f81 fix: remove limit of length 4 for task id (#4304)
- 0645ba9 fix: harness dependency class fore azure (#4302)
- 9991528 chore: bump version: 0.18.2-dev0 -> 0.18.2-rc0
- 86b6c28 chore: lock api state for backward compatibility check
- 271ae41 chore: ensure container terminations aren't lost in the event of network failures [DET-7440, DET-7441] (#4272)
- 2d5a118 fix: check for errors from actor asks in createResourcePoolSummary [DET-7494] (#4294)
- daee036 fix: stale agent state crashes resource pool [DET-7493] (#4295)
- 4722caa fix: limit waiting for logger to 30 seconds (#4289)
- b8f9463 fix: jupyterlab modal stale values, not sending values to api
- 42615b4 feat: add cli and api for delete checkpoints [DET-7119], [DET-7120] (#4246)
- ad9bcd8 chore: restore determined_version in Trial checkpoint metadata (#4288)
- 0b7dcdb chore: widen node version detection to 16.29 (#4290)
- 118a850 fix: pass signals through wrapper processes (#4286)
- 0089ede fix: enable Full Configuration in appropriate modals [DET-7495] (#4280)
- 45c62f2 feat: show only selected trials in learning curve
- 922b113 chore: reduce use of any in InteractiveTable
- 5b03672 chore: update docs to accurate node support (#4279)
- 59f686c fix: make trials table sort properly (#4275)
- 695b301 feat: incrementally release resources (#4278)
- c0f53fa perf: avoid Seq Scanning raw steps (#4244)
- b36596f fix: backslashes in show_ssh_command for windows (#4260)
- ec503e0 fix: correct error message when command list fails (#4235)
- 15b0525 feat: add master-side verification for agent mTLS (#4220)
- c6e8249 fix: don't hardcode /bin/which in entrypoints (#4257)
- 236c241 refactor: migrating tables to use InteractiveTable component [DET-7382] (#4229)
- 156ff50 fix: reshow drop targets for customize columns modal (#4199)
- 5e99626 docs: delete mnist_tf_layers example (#4263)
- 9d82231 Revert "add autosync action"
- 8288461 add autosync action
- 70e534f fix: det task logs unable to use trial task IDs and checkpoint GC task IDs [DET-7424] (#4258)
- 12d21c2 fix: gcs storage upstream test failure (#4253)
- b09ec40 feat: improve trial logs to have some system events [DET-5885] (#4215)
- c20ad43 ci: Lints migrations to ensure new migrations have higher timestamps than old ones [DET-7146] (#4250)
- da83e65 chore: small cleanups for slurm (#4211)
- b71f179 feat: det deploy now can use --yes to skip prompts [DET-7408] (#4255)
- 7b3fee4 chore: better slurm option override support (#4254)
- 7af3cfe feat: add google cloud storage (gcs) prefix support [DET-6883] (#4238)
- b5c2b14 docs: add enhanced launcher user guide (#4248)
- eb1d9aa chore: add dev server support for embedded tasks view (#4243)
- 9d19a87 perf: dont repeatedly reprocess profiler data
- b6582c0 ci: add check to prevent ssh git url (#4240)
- 50bdbc3 chore: bumpenvs (#4239)
- 1065314 chore: add node_modules to eslintignore (#4237)
- c74dee6 feat: Add theme toggle to user settings [DET-7321] (#4204)
- 3016fc1 chore: remove legacy code/docs for NCCL/Gloo port range config (#4187)
- f1e8d3c ci: check ulimit before 4x4 distributed test on macs (#4234)
- 7d9e999 fix: adjust page to preserve props.children (#4231)
- 35258ef feat: Enable sending empty string for displayname with fallback to username [DET-7031] (#4140)
- 2e77ec5 fix: det shell start/open, in windows (#4227)
- 7217cfd feat: k8s detect non-det tasks (#4154)
- a88f5c4 chore: share webui base page (#4218)
- 15bd758 fix: use custom image for tensor board [DET-7242] (#4123)
- de926b1 chore: fix rendezvous timeout logic (#4226)
- c29e97a chore: base Dockerfile TensorFlow 2.6, 2.7, 2.8 security patches [DET-7325] (#4223)
- 7882381 fix: authenticate pprof endpoints [DET-7402]
- 2b0ccb8 chore: bump version: 0.18.1-dev0 -> 0.18.2-dev0
- bcbab4d docs: add release notes for 0.18.1 (#4216)
- 034b957 chore: revert rename of
RestoreResourcesFailure -> ResourcesFailure
. (#4210) - a7c4c2a feat: enable agent-side mTLS for connection to master (#4212)
- c9e13b6 feat: save connection in context (#4213)
- 15f65ab feat: pix2pix example (#4125)
- db987de chore: delete "conditional" json-schema extension (#4177)
- 9b33f82 fix: use bigint for checkpoint size in
proto_get_trials_plus
(#4208) - 2c4c847 docs: update release note instructions with important admonition (#4207)
- 138caf8 fix: pool detail page tab count when loading (#4200)
- c5f685f feat: move task logs to embedded view [DET-7169] (#4179)
- 54982c9 perf: tweak proto_get_trials_plus plan (#4206)
- e585ebb refactor: cleanup task logging shell scripts (#4113)
- 4e2913f chore: update entrypoint in expconf docs (#4198)
- fcff1c2 fix: agent panic on commands with unusal formatted environment variables [DET-6649] (#4202)
- 4172a46 refactor: pull in user service code changes from EE (#4183)
- 1c48fa6 docs: improve OpenTelemetry docs slightly (#4182)
- 7f508e4 fix: allow
internal: null
for pre-0.15.6 experiments (#4197) - 55957fe fix: add restarts back to get_trial_ids for sorting
- dd8d3f3 feat: add det experiment logs <EXP_ID> [DET-7145] (#4190)
- 3ef36d5 chore: refactor action dropdown comp to be reused [DET-7171] (#4164)
- 623e60d ci: bust circleci cache (#4189)
- af56e01 docs: document using AWS Load Balancer on EKS [DET-6669] (#4174)
- 36e5667 feat: allow enabling Prometheus monitoring through helm [DET-6993] (#4158)
- 3bb7bb1 style: minor theme fixes and style adjustments [DET-7349] (#4161)
- 2166dfc docs: update screen shots for cluster UI (#4188)
- b7a3278 style: address new
flake8-comprehensions
,pyzmq==23.0.0
. (#4185) - 5e1a81c feat: allow setting of checkpointStorage.prefix through helm [DET-7152] (#4152)
- 526e1dc feat: display trial restarts [DET-7347] (#4160)
- 2fdc6d7 fix: agent can now be control-C while connecting to master [DET-6287] (#4178)
- d339f7b chore: migrate det a list to new api and bindings (#4186)
- ed257d3 refactor: rip out
UseFluentLogging
. (#4184) - 21b8590 docs: update fluent-bit version. (#4181)
- 4b325a5 docs: document database SSL options (#4169)
- 5d228f8 chore: make core-api tutorial Windows-friendly (#4176)
- 1875051 fix: sync slot usage for k8s [DET-7350] (#4172)
- 92a944f chore: add .dccache to .gitignore (#4173)
- 5e7a30c docs: fix typo in release note (#4170)
- a52210f feat: chart sync provider [DET-7309] (#4139)
- a7fafbb fix: enable currently active side nav item (#4167)
- 0a5d54d chore: fix hardcoded url in schema logic (#4171)
- 7df89b2 chore: allow deleting delete failed experiments (#4141) [DET-7070]
- 4ef2b67 perf: fixup query for latest training per trial (#4166) [DET-7352]
- 3d3fe1c fix: include both old and new checkpoints in total checkpoint size (#4165)
- 4d98cee fix: replace carriage returns with newlines in task output [DET-5302] (#3945)
- a2f878a chore: only warn on invalid calls to daemonize resources for slurm (#4108)
- c00ce0a chore: check git state in lock-api-state.sh (#4163)
- ec743d7 ci: turn off github annotations. (#4146)
- 0efd44d doc: fix a broken file reference (#4131)
- b911d85 fix: avoid potential race between AllocationReady and Running state (#4159)
- cc59985 revert: partial revert of 96e0e58 (#4162)
- f07633b fix: port collisions for multiple shared-non distributed jobs (#4120) [HAL-2894]
- 80d1bb1 feat: Add embedded experience for JupyterLab and TensorBoard [DET-7162] (#4134)
- 83cc1ec fix: prevent experiment name in header from flowing entire vertical space of screen during resize (#4157)
Docker images
docker pull determinedai/determined-master:0.18.2
docker pull determinedai/determined-master:d214a34df
docker pull determinedai/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1
docker pull determinedai/determined-dev:determined-master-d214a34df
docker pull determinedai/determined-dev:determined-master-d214a34df0c0eb2e5e38ae63d1359862fd2af8f1
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.2
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:d214a34df0c0eb2e5e38ae63d1359862fd2af8f1
0.18.1
Changelog
- 9284a3a chore: bump version: 0.18.1-rc7 -> 0.18.1
- eb09c34 docs: add release notes for 0.18.1 (#4216)
- bad9a33 chore: bump version: 0.18.1-rc6 -> 0.18.1-rc7
- 6cff1b0 fix: use bigint for checkpoint size in
proto_get_trials_plus
(#4208) - 60291e9 chore: bump version: 0.18.1-rc5 -> 0.18.1-rc6
- 8f4a797 perf: tweak proto_get_trials_plus plan (#4206)
- 414bcd2 chore: bump version: 0.18.1-rc4 -> 0.18.1-rc5
- 789b39c fix: allow
internal: null
for pre-0.15.6 experiments (#4197) - fd27bac fix: add restarts back to get_trial_ids for sorting
- 845f2f0 chore: bump version: 0.18.1-rc3 -> 0.18.1-rc4
- eed09e9 docs: update screen shots for cluster UI (#4188)
- 88271dd style: minor theme fixes and style adjustments [DET-7349] (#4161)
- f2a4e5e feat: display trial restarts [DET-7347] (#4160)
- eaf84e6 chore: bump version: 0.18.1-rc2 -> 0.18.1-rc3
- a8ddc82 fix: sync slot usage for k8s [DET-7350] (#4172)
- 1f13710 fix: enable currently active side nav item (#4167)
- e9333d2 perf: fixup query for latest training per trial (#4166) [DET-7352]
- 764ef2d fix: include both old and new checkpoints in total checkpoint size (#4165)
- 4157c82 chore: bump version: 0.18.1-rc1 -> 0.18.1-rc2
- e2f949a chore: bump version: 0.18.1-rc0 -> 0.18.1-rc1
- 26ede20 chore: revert scheduling docs
- 8ea2a52 fix: prevent experiment name in header from flowing entire vertical space of screen during resize (#4157)
- 1e244e0 chore: bump version: 0.18.1-dev0 -> 0.18.1-rc0
- 96e0e58 chore: lock api state for backward compatibility check
- 82f0366 feat: allow NaN validation metrics [DET-7177] (#4150)
- 0bbeec1 feat: upload all tb files DET-7139 (#4155)
- 90b918a fix: adjust upscaling of column widths [DET-7220] (#4138)
- 4b66bf0 feat: rolling upgrades v0 [DET-6548] (#4031)
- beea245 ci: disable most checks on ci-only changes (#4118)
- 5f7e74a fix: upstream test failures due to config being admin protected (#4153)
- da1dcd7 fix: return user data when new user is created [DET-7255] (#4149)
- 1eba7a2 docs: a vain attempt to pass ci test on already approved pr4110 content changes (#4151)
- 7aea015 fix: No redirecting url when model name is changed (#4127)
- 00171f8 feat: Cluster UI improvement [DET-7072, DET-7073] (#4009)
- 27e04e4 feat: require admin privileges for cluster managment [DET-7186] (#4129)
- 09a8ff6 ci: update gke version. (#4147)
- fa3a959 ci: Increase package-and-push-system-local resource class (#4143)
- 0d4fe23 chore: fix boolean urlparams for grpc (#4136)
- b1829b8 feat: enable SLURM preemption (#4114) [FOUNDENG-21]
- b4ef273 chore: add a local docs server (#4117)
- 5dea211 refactor: theme architecture [DET-6211] (#4004)
- aef66b0 build: make docs build incremental and idempotent (#4116)
- ec7007c ci: persist debs and rpms in circleci for dev, rc, and release builds (#4124)
- b89b0a3 fix: user filter on dashboard [DET-7251] (#4132)
- e3e50a9 chore: add job ID and experiment labels to prometheus endpoint mappings [DET-6964] (#4119)
- b0d8a93 ci: make codecov information for sure now. (#4130)
- b313503 chore: restructure shareable webui utils and types (#4112)
- 78505d6 ci: turn off codecov bot PR comments (#4122)
- a91866f chore: fix rank determination for horovod with mpi (#4109)
- 3368a77 fix: NCCL interface in distributed tests (#4111)
- 18aadd0 chore: bump version: 0.18.0-dev0 -> 0.18.1-dev0
- 7500d6f docs: add release notes for 0.18.0 (#4102)
- 1f4a642 chore: bump version: 0.17.16-dev0 -> 0.18.0-dev0
- 59928ec chore: explicit naming of preemption and coscheduler resources [DET-7140] (#4101)
- ac2564b fix: add missing task log teardown for trials (#4107)
- c41e1dc docs: rework quickstart for ml developers (#4091)
- 2c7564d chore: use reported slots available for on prem deployments (#4095)
- 62049ae chore: change codecov to informational only (#4105)
- 5061a8f fix: mark distributed tests as parallel (#4093)
- 87b8e53 chore: enable codecov enforcement (#4084)
- 15dd7e3 fix: bindings sessions in experiment apis. (#4096)
- eb65b09 chore: cleanup and fixes for "det deploy" (#4103)
- d78a712 perf: improve plan for proto_get_trial_plus.sql (#4073)
- a980c56 build: update submodules on webui get-deps (#4082)
- 78d4e8b chore: HAL-2879 Cleanly shutdown all sshd servers on exit (#176) (#4087)
- 940f8f7 chore: Refactor JupyterLabModal pattern [DET-6276] (#4072)
- ec5553a chore: make container proxy support more flexible, for slurm (#3948)
- 3614c83 chore: wait for process substition log filters [DET-6712] (#3930)
- 474742f chore: clean up useCallback dependency (#4092)
- 39588ee feat: add det.LOG_FORMAT constant (#4090)
- bcc50f9 fix: Support rendering rank of 0 (#4083)
- c523523 fix: consistent total slot calculation for cluster overview [DET-7182] (#4080)
- 9c16349 feat: add wrap_rank helper script (#4086)
- ea4a949 fix: dont show archived in column picker [DET-7187] (#4085)
- a4e5f84 feat: authenticate task proxies (#4071)
- 5fab384 fix: wrap torch.distributed launch in pid server/client (#4077)
- e73f063 fix: use displayNames in ClusterHistoricalUsage (#4059)
- 3368dd8 chore: filter out NaN, +/- Infinity metric values for charts for now. (#4076)
- f8b5bf5 chore: add and consolidate code coverage to codecov (#4064)
- 487b04c docs: release note for core api (#4069)
- 15a668b fix: add user column back to experiment list (#4070)
- 880b769 feat: break workload info from trial endpoint into a new endpoint [DET-6729] (#3635)
- d703c96 fix: show notification when delete experiment fail [DET-6811] (#4051)
Docker images
docker pull determinedai/determined-master:0.18.1
docker pull determinedai/determined-master:9284a3aa6
docker pull determinedai/determined-master:9284a3aa6e307c61426c93b5e09730c664725604
docker pull determinedai/determined-dev:determined-master-9284a3aa6
docker pull determinedai/determined-dev:determined-master-9284a3aa6e307c61426c93b5e09730c664725604
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.1
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9284a3aa6
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9284a3aa6e307c61426c93b5e09730c664725604
0.18.0
Changelog
- 3c00fc2 chore: bump version: 0.18.0-rc3 -> 0.18.0
- 797ceec docs: add release notes for 0.18.0 (#4102)
- 5051024 chore: bump version: 0.18.0-rc2 -> 0.18.0-rc3
- bd87faa perf: improve plan for proto_get_trial_plus.sql (#4073)
- c2a3ba9 fix: Support rendering rank of 0 (#4083)
- d46995f fix: consistent total slot calculation for cluster overview [DET-7182] (#4080)
- 1b94612 feat: add det.LOG_FORMAT constant (#4090)
- aacd23b feat: add wrap_rank helper script (#4086)
- 8fbf0a7 fix: dont show archived in column picker [DET-7187] (#4085)
- a190a1f chore: bump version: 0.18.0-rc1 -> 0.18.0-rc2
- 88fc38b chore: bump version: 0.18.0-rc0 -> 0.18.0-rc1
- 4d73cea feat: authenticate task proxies (#4071)
- 3e594e4 chore: filter out NaN, +/- Infinity metric values for charts for now. (#4076)
- 1038840 fix: wrap torch.distributed launch in pid server/client (#4077)
- 7df0f00 docs: release note for core api (#4069)
- 7721553 fix: add user column back to experiment list (#4070)
- db5f434 chore: bump version: 0.17.16-dev0 -> 0.18.0-rc0
- 02114cf chore: lock api state for backward compatibility check
- 7be795a docs: Core API reference and cookbook docs (#4054)
- 5e364d1 chore: ancient checkpoints for very old pytorch (#4068)
- 51e80d8 perf: minimize create/destroy of uPlots [DET-6972] [DET-6972] [DET-6796] [DET-6853] [DET-6672] (#3935)
- 531af28 chore: update removed reducer methods (#4067)
- 6f63f13 chore: remove det.pytorch.reset_parameters() (#4066)
- 4fb1886 feat: generic checkpoints and making Core API public (#3859)
- db0f8ef fix: correct the return type for readStream (#4063)
- 33a1a6c chore: remove PBT searcher (#4058)
- a35d49a feat: support for torch native dtrain (#3807)
- 29385bc chore: update k8s scheduler to run latest image (#4061)
- e7f3289 chore: remove remainder of native api (#4055)
- 87008ac chore: deprecate data layer (#4056)
- 75a6bf5 chore: remove deprecated experimental custom reducer methods (#4060)
- 948518c chore: remove unnecessary use of username in webui (DET-6922) (#4049)
- d2cc8d4 refactor: simplify apiConfig to reduce redundancy (#4043)
- 707dd26 chore: Refactor CheckpointModal to be hook based [DET-7136] (#4034)
- ea535b0 fix: make "Show full config" modal larger (#4053)
- 625070e chore: move webui codecov upload to use env var instead of hardcoded token (#4045)
- 0c1821e fix: remove dead shell start code [DET-7131] (#4016)
- 59e5b9b chore: Recommend git clone --recurse-submodules for submodules (#4036)
- ad06cd4 fix: move allocation resources migration to the top.
- a6bf58b refactor: rewrite ndjson streamer [DET-7121] (#4014)
- 6d018f0 fix: Alter Boolean arg default handling (#4038)
- b278ef0 chore: store allocation resources and agent RM containers in DB. (#3946)
- 7afc75f ci: Add build/coverage badge for webui/react [DET-6767] (#4028)
- 628f07c move up profile-pics migration to assure it happens
- 600d77d fix: put profile pic migrations in correct directory
- 3569a6f build: set up shared-web submodule [DET-6961] (#4006)
- 0686bef test: ensure that agent disabling doesn't count for experiment restarts [DET-5916] (#4029)
- b61a83d refactor: remove
NewAllocationID
. (#3959) - b605da8 fix: add another missing key fix (#4033)
- 64e6f7d feat: allow position modification in k8s [DET-6967] [DET-6968] (#3938)
- 24bbec9 feat: Creating a table for user profile pictures
- b831dbd fix: Human-readable option for empty filters in logs [DET-6781, DET-6999] (#4017)
- 5d04a25 fix: Prevent archived models from appearing in the Register Checkpoint modal [DET-7132] (#4015)
- c1523da fix: Ensure table offset does not exceed pagination total [DET-6829] (#4011)
- 84401b2 fix: change timeout in e2e_tests/tests/cluster/test_logging.py/test_trial_logs (#4019)
- ad6f3d5 fix: add key for cancel operation (#4023)
- 9ba0f04 chore: remove deprecated type provider for moment-timezone (#4018)
- be3125b docs: update copyright year (#4005)
- ea4ab88 feat: add product feedback link [DET-5811]
- 6162418 chore: fix documentation comment for /ws/data-layer
- 482ecd6 docs: fix grammar in training-run index (#3988)
- 5480903 docs: fix indent in pytorch-porting-tutorial (#3989)
- 9a6d752 chore: warn about ambiguous enum params (#3997)
- 4de96b9 chore: give the latest-master deploy job a name (#3900)
- 64b4662 fix: pass task time not from logCtx (#3993)
- 59796b0 chore: update node version to active LTS (DET-7046) (#3932)
- 07de426 chore: show appropriate severity level on job launch failures (#4002)
- c68b7a4 test: add option to disable
compare_stats
. (#4008) - 915184c chore: Removing CODEOWNERS entry for release notes (#3978)
- 93b98c2 fix: handle zeros correctly in HpTrialTable metricSorter (#4003)
- cd0e531 chore: bump version: 0.17.15-dev0 -> 0.17.16-dev0
- 7d1493d docs: add release notes for 0.17.15 (#3986)
- 63eb86c feat: add modal to explain why users cannot delete items [DET-6998] (#3994)
- f6fbc05 fix: guard trial and allocation exit logic correctness (#3983)
- 08218c1 fix: do not default to
noverify
forbindings
sessions in CLI. (#3991) - 4c5dad8 chore: add expconf environment.slurm (#3966)
- d2610c5 fix: handle infinite metrics in searcher snapshots [DET-7122] (#3999)
- 57d1b38 chore: handle rank id in log entries (#3995)
- f915a69 fix: handle infinite validation metric values in more cases (#3992)
- 8527e60 fix: Experiment columns filter is still applied after closing [DET-6837] (#3982)
- c7f4610 fix: avoid permanent filtered state in model registry [DET-6946] (#3984)
- 35ac5ff docs: update release note process (#3990)
- d565f84 ci: bump profiling test timeout back up (#3981)
- a43fcff chore: handle errors from starting allocations [DET-5862] (#3975)
- bd54845 feat: add det support bundle [DET-5886] (#3904)
- 899db19 feat: add drag and drop functionality to experiment list column table [DET-7044] (#3956)
- cf5c94f chore: document usage of /ws/data-layer [DET-6685] (#3971)
- 6f6e4a2 feat: Add overall allocation bar to new cluster page [DET-7074] (#3955)
- ff78aa9 chore: add error type for non retry-able resource manager errors (#3947)
- e9d26af chore: apply filtering to task logs (#3963)
- 2a981a1 fix: add test warmup command e2e [DET-5803] (#3965)
- a77b8c1 docs: fix tutorial link swap (#3979)
- 5d65f64 ci: remove old semantic PR app config. (#3980)
- 896725b fix: notebook logs filter by level (#3967)
- f3660c2 chore: remove old notebook logs endpoint (#3960)
- 8a99e03 fix: task log level parsing (#3973)
- 005d144 fix: forward job.DeleteJob through agentRM (#3968)
- 8e21bf0 docs: add release notes for PR #3914 (#3962)
- 28ab2fa chore: Update list of false alarms in Docker image scanning (#3933)
- 80a1dc7 ci: new semantic pull request check. (#3958)
- 189f148 docs: update package versions, add ROCm, edit to style guide (#3950)
- 0d180fe fix: update HPE logo sizes (#3953)
- 393de71 chore: add message to cleanup external RM resources on delete, for slurm (#3902)
Docker images
docker pull determinedai/determined-master:0.18.0
docker pull determinedai/determined-master:3c00fc281
docker pull determinedai/determined-master:3c00fc281542c272c1591c7d1c86eb53db8f230c
docker pull determinedai/determined-dev:determined-master-3c00fc281
docker pull determinedai/determined-dev:determined-master-3c00fc281542c272c1591c7d1c86eb53db8f230c
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.18.0
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:3c00fc281
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:3c00fc281542c272c1591c7d1c86eb53db8f230c
0.17.15
Changelog
- 9b74e54 chore: bump version: 0.17.15-rc4 -> 0.17.15
- 931983a chore: bump version: 0.17.15-rc3 -> 0.17.15-rc4
- 3bd2b4a fix: handle infinite metrics in searcher snapshots [DET-7122] (#3999)
- 153221e chore: bump version: 0.17.15-rc2 -> 0.17.15-rc3
- 1e9f5c4 chore: handle rank id in log entries (#3995)
- d12c892 fix: handle infinite validation metric values in more cases (#3992)
- 59b180d docs: add release notes for 0.17.15 (#3986)
- 9e3e541 chore: bump version: 0.17.15-rc1 -> 0.17.15-rc2
- 1c97bc7 chore: apply filtering to task logs (#3963)
- 8dd01c8 fix: task log level parsing (#3973)
- 9c24121 docs: add release notes for PR #3914 (#3962)
- 507cfe7 fix: update HPE logo sizes (#3953)
- e1a3de6 chore: bump version: 0.17.15-rc0 -> 0.17.15-rc1
- dc3ef47 chore: bump version: 0.17.15-dev0 -> 0.17.15-rc0
- 6c38aad chore: lock api state for backward compatibility check
- f206919 fix: take out job summary caching [DET-6695] (#3849)
- c9941ac chore: add missing icons to mobile navbar [DET-7009]
- 927ef94 fix: parse different time format in compare stats script [DET-7039] (#3909)
- 55fcf65 fix: checkpoint gc job should close (#3943)
- b8f1073 feat: track task stats [DET-6872, DET-6926, DET-6927] (#3852)
- b1a470d chore: add key attribute to avatar ActionCard action (#3939)
- 7c7375c chore: give det job a default command to run (#3934)
- 9028fe0 chore: mask registry auth password in harness [DET-6279] (#3867)
- 7d0234b chore: add filtering by userIds to API endpoints (DET-7019) (#3898)
- 180a79f chore: mask registry creds in webui [DET-7013] (#3881)
- 2d44564 replace username with userId for user API (#3914)
- b6b2c71 test: add interaction tests for spinner [DET-6665] (#3826)
- d86d09b chore: prevent InteractiveTable scroll from moving pagination or other controls [DET-7037] (#3923)
- c9a2458 fix: crash on upgrade to InteractiveTable [DET-7036] (#3922)
- 4c0ef95 hide archived experiments unless using --all (#3918)
- 586bba4 docs: tweak docs for socket activation (#3926)
- 64ae7ca perf: add infiniband-related libraries to environment (#3832)
- bcc954a chore: disallow getting metadata from dummy checkpoint context (#3920)
- 575cc21 chore: update dep requirements for react (#3901)
- e359843 chore: demote _get_last_validation to internal (#3919)
- 5c93cb6 chore: log checkpoint uuids in core, not in wlsq (#3924)
- 7086df4 chore: add missing docstrings in core api (#3911)
- ea0db53 feat: add slurm rendezvous (#3777)
- d5e793b chore: make store_path auto-create the directory (#3916)
- 1f3c4b2 feat: add core.DownloadMode (#3910)
- c334a53 docs: fix install cli typo (#3917)
- 069eb6f fix: run scheduling on agent connection/enable events, reconnectBacklog replay. (#3906)
- b75afe6 chore: bump version: 0.17.15-dev0-dev0 -> 0.17.15-dev0
- d06dc94 refactor: remove dependency of settings in the updateSettings call (#3894)
- 1b4c8e9 chore: remove pr preview cluster address [DET-7040] (#3907)
- e5723c7 docs: Release notes for 0.17.14 (#3912)
- 43b9a7c chore: bump version: 0.17.14 -> 0.17.15-dev0
- 5c45544 fix: RM crashes when setting cmd priority (#3908)
- 0c52367 chore: fix deepspeed nightly tests (#3897)
- 3e6267f chore: update StorageManager and extend CheckpointContext (#3829)
- f76cc45 docs: fix description of scheduling_unit behavior (#3890)
- 1815ee3 chore: sweeping rename of Core API components (#3896)
- 0b7141b feat: deepspeed DCGAN example (#3758)
- 015a8e0 feat: use enums instead of chief_only bool in Core API (#3888)
- 129d841 fix: use otel only if enabled (#3893)
- 0ab3288 hide sizeChanger on RoutePagination (#3892)
- ae50d3c fix: match reported rp name for k8 across endpoints [DET-7006] (#3870)
- c49561c docs: various fixes for master configuration and k8s docs (#3889)
- c716759 chore: bump version: 0.17.13-dev0 -> 0.17.14-dev0
- 2fdb2ef docs: add release notes for 0.17.13 (#3879)
- a349622 chore: Clean up tests with fewer Optional types and asserts (#3872)
- b0e0c96 feat: make core.Searcher multiworker-safe (#3871)
- 50c9f66 feat: show agent version in
/agents
anddet agent list
[DET-6847] (#3873) - 954cf97 fix: avoid double-timestamps in logs (#3876)
- b00124a feat: Drag to Reorder and Resize Experiment List columns [DET-6438] [DET-6809] (#3765)
- be58485 chore: use better NCCL SOCKET setting for gpt-neox (#3874)
- 65e1a97 chore: remove internal flag from det.launch.horovod --help (#3875)
- 9b1fa3b feat: Search models by name and description substring [DET-6939] (#3869)
- 78ba4a3 feat: add opentel to determined master [DET-6775] (#3851)
- 789f16d chore: cleanup stray changes (#3868)
Docker images
docker pull determinedai/determined-master:0.17.15
docker pull determinedai/determined-master:9b74e5444
docker pull determinedai/determined-master:9b74e54448d64009ce574e1a68b52149c1d00fe7
docker pull determinedai/determined-dev:determined-master-9b74e5444
docker pull determinedai/determined-dev:determined-master-9b74e54448d64009ce574e1a68b52149c1d00fe7
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:0.17.15
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9b74e5444
docker pull nvcr.io/isv-ngc-partner/determined/determined-master:9b74e54448d64009ce574e1a68b52149c1d00fe7