factory functions - fix handling of default device #820

kshitij12345 · 2024-07-22T12:05:22Z

Fixes #621

Changes are similar to #775

Stash torch.get_default_device in cache_info. Also, this adds a check to the prologue trace to verify that jitted fn is called with same default device- see example prologue below.
Factory functions infer the device based on torch.get_default_device from cache_info (if device is not passed explicitly)
We don't support changing the default device in the jitted fn as reordering and fusion can lead to issue - it is a loud error for now (we can revisit in follow-up if required).

Repro:

import torch
import thunder

def foo(x):
    return torch.ones(x.shape).device

x = torch.randn(3)

torch.set_default_device("cuda")
print(foo(x))
jfoo = thunder.jit(foo)
print(jfoo(x))

print(thunder.last_prologue_traces(jfoo)[-1])
print(thunder.last_traces(jfoo)[-1])

Prologue Trace:

# Constructed by Transform for execution (took 1 milliseconds)
import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast
def prologue(*args, **kwargs):
  # args: "Any"
  check_len(args, 1)
    # prims.check_len(args, 1)
  # kwargs: "Any"
  check_len(kwargs, 0)
    # prims.check_len(kwargs, 0)
  t_0: "cpu f32[3]" = args[0]
  check_tensor_metadata(t_0, (3,), 'cpu', torch.float32, False)
    # prims.check_tensor_metadata(t_0, (3,), 'cpu', torch.float32, False)
  cache_info: "Any" = thunder._get_cache_info()
  cache_info_default_dtype: "<class 'torch.dtype'>" = cache_info['default_dtype']
  check_literal_like(cache_info_default_dtype, torch.float32)
    # prims.check_literal_like(cache_info_default_dtype, torch.float32)
  cache_info_default_device: "<class 'torch.device'>" = cache_info['default_device']
  check_literal_like(cache_info_default_device, torch.device("cuda:0"))
    # prims.check_literal_like(cache_info_default_device, torch.device("cuda:0"))
  cache_info_is_autocast_enabled: "bool False" = cache_info['is_autocast_enabled']
  check_number_type_and_value(cache_info_is_autocast_enabled, False)
    # prims.check_number_type_and_value(cache_info_is_autocast_enabled, False)
  cache_info_no_grad_sync: "bool False" = cache_info['no_grad_sync']
  check_number_type_and_value(cache_info_no_grad_sync, False)
    # prims.check_number_type_and_value(cache_info_no_grad_sync, False)
  return ((), ())

Comp Trace

# Constructed by Delete Last Used (took 0 milliseconds)
import torch
from thunder.executors.torchex import no_autocast

@torch.no_grad()
@no_autocast
def computation():
  return torch.device("cuda:0")

kshitij12345 · 2024-07-22T13:40:16Z

Current stable PyTorch is 2.3. For mac, looks like we install - torch-2.2.2 - https://github.com/Lightning-AI/lightning-thunder/actions/runs/10040430679/job/27746469483?pr=820#step:9:311 which may have some issue with torch.get_default_device (see pytorch/pytorch#126632)

t-vi · 2024-07-22T13:42:54Z

Ugh, I'll look into updating, should not keep you back.

…htning-thunder into default-device-handling

t-vi

As always, a pleasure to read. Thank you @kshitij12345

factory functions - fix handling of default device

e0d51e3

t-vi mentioned this pull request Jul 22, 2024

bump pytorch min ver to 2.3 and macos runners to m1 #822

Merged

kshitij12345 and others added 4 commits July 22, 2024 17:21

xfail with device ctx test

ca9051e

Merge branch 'main' into default-device-handling

fb9a305

strict xfail

ea4bcee

Merge branch 'default-device-handling' of github.com:kshitij12345/lig…

2dba528

…htning-thunder into default-device-handling

kshitij12345 marked this pull request as ready for review July 22, 2024 16:51

kshitij12345 requested review from mruberry, lantiga and t-vi as code owners July 22, 2024 16:51

t-vi approved these changes Jul 22, 2024

View reviewed changes

t-vi merged commit 85df4f6 into Lightning-AI:main Jul 22, 2024
39 checks passed

t-vi mentioned this pull request Jul 24, 2024

CI failure in test_core.py::test_traceback #844

Open

kshitij12345 mentioned this pull request Jul 24, 2024

test_change_default_device_with_ctx is flaky. #853

Closed

github-actions bot deleted the default-device-handling branch October 23, 2024 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

factory functions - fix handling of default device #820

factory functions - fix handling of default device #820

kshitij12345 commented Jul 22, 2024

kshitij12345 commented Jul 22, 2024

t-vi commented Jul 22, 2024

t-vi left a comment

factory functions - fix handling of default device #820

factory functions - fix handling of default device #820

Conversation

kshitij12345 commented Jul 22, 2024

kshitij12345 commented Jul 22, 2024

t-vi commented Jul 22, 2024

t-vi left a comment

Choose a reason for hiding this comment