Release 0.19.0 · dstackai/dstack

Simplified backend integration

To provide best multi-cloud experience and GPU availability, dstack integrates with many cloud GPU providers including AWS, Azure, GCP, RunPod, Lambda, Vultr, and others. As we'd like to see even more GPU providers supported by dstack, this release comes with a major internal refactoring aimed to simplify the process of adding new integrations. See the Backend integration guide for more details. Join our Discord if have any questions about the integration process.

MPI workloads and NCCL tests

dstack now configures internode SSH connectivity for distributed tasks. You can log in to any node from any node via SSH with a simple ssh <node_ip> command. The out-of-the-box SSH connectivity also allows running mpirun. See the NCCL Tests example.

Cost and usage metrics

In addition to DCGM metrics, dstack now exports a set of Prometheus metrics for cost and usage tracking. Here's how it may look in the Grafana dashboard:

See the documentation for a full list of metrics and labels.

Cursor IDE support

dstack can now launch Cursor dev environments. Just specify ide: cursor in the run configuration:

type: dev-environment
ide: cursor

Deprecations

The Python API methods get_plan(), exec_plan(), and submit() are deprecated in favor of get_run_plan(), apply_plan(), and apply_configuration(). The deprecated methods had clumsy signatures with many top-level parameters. The new signatures align better with the CLI and HTTP API.

Breaking changes

The 0.19.0 release drops several previously deprecated or undocumented features. There are no other significant breaking changes. The 0.19.0 server continues to support 0.18.x CLI versions. But the 0.19.0 CLI does not work with older 0.18.x servers, so you should update the server first or the server and the clients simultaneously.

Drop the dstack run CLI command.
Drop the --attach mode for the dstack logs CLI command.
Drop Pools functionality:
- The dstack pool CLI commands.
- /api/project/{project_name}/runs/get_offers, /api/project/{project_name}/runs/create_instance, /api/pools/list_instances, /api/project/{project_name}/pool/* API endpoints.
- pool_name and instance_name parameters in profiles and run configurations.
Remove retry_policy from profiles.
Remove termination_idle_time and termination_policy from profiles and fleet configurations.
Drop RUN_NAME and REPO_ID run environment variables.
Drop the /api/backends/config_values endpoint used for interactive configuration.
The API accepts and returns azure_config["regions"] instead of azure_config["locations"] (unified with server/config.yml).

What's Changed

Fix gateways with a previously used IP address by @jvstme in #2388
Simplify backend configurators and models by @r4victor in #2389
Store BackendType as string instead of enum in the DB by @r4victor in #2393
Introduce ComputeWith classes to detect compute features by @r4victor in #2392
Move backend/compute configs from config.py to models.py by @r4victor in #2395
Provide default run_job implementation for VM backends by @r4victor in #2396
Configure inter-node SSH on multi-node tasks by @un-def in #2394
[Blog] Using SSH fleets with TensorWave's private AMD cloud by @peterschmidt85 in #2391
Add script to generate boilerplate code for new backend by @r4victor in #2397
Add datacenter-gpu-manager-4-proprietary to CUDA images by @un-def in #2399
Drop pools by @r4victor in #2401
Transition high-level Python runs API to new methods by @r4victor in #2403
Drop dstack run by @r4victor in #2404
Drop dstack logs --attach by @r4victor in #2405
Remove retry_policy from profiles by @r4victor in #2406
Remove termination_idle_time and termination_policy by @r4victor in #2407
Clean up models backward compatibility code by @r4victor in #2408
Restore removed models fields for compatibility with 0.18 clients by @r4victor in #2414
Clean up legacy repo fields by @jvstme in #2411
Switch AWS gateways from t2.micro to t3.micro by @r4victor in #2416
Remove old client excludes by @r4victor in #2417
Use new JobTerminationReason values by @r4victor in #2418
Drop RUN_NAME and REPO_ID env vars by @r4victor in #2419
Drop irrelevant Nebius backend implementation by @jvstme in #2421
[Feature]: Support the cursor IDE #2412 by @peterschmidt85 in #2413
Simplify implementation of new backends #2372 by @olgenn in #2423
Support multiple domains with Entra login by @r4victor in #2424
Support setting project members by email by @r4victor in #2429
Fix json schema reference and invalid properties errors by @r4victor in #2433
[Blog]: DeepSeek R1 inference performance: MI300X vs. H200 by @peterschmidt85 in #2425
Add new metrics by @un-def in #2434
Add instance and job cost/usage Prometheus metrics by @un-def in #2432
[Docker] Add dstackai/efa image by @un-def in #2422
Restore fleet termination_policy for 0.18 backward compatibility by @r4victor in #2436
[Bug]: Search over users doesn't work by @olgenn in #2439
[Feature]: Support activating/deactivating users via the UI by @olgenn in #2440
[Feature]: Display Assigned Gateway Information on Run Pages by @olgenn in #2438
[Docs]: Update the Metrics guide by @peterschmidt85 in #2441
[Examples] Update nccl-tests by @un-def in #2415

Full Changelog: 0.18.44...0.19.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.19.0