Unable to Train a LoRA on MPS/MLX

Hello, and thanks for such an awesome project! So I was interested in creating a LoRA consisting of 107 songs and I tried with both the Gradio UI and with sidestep. First, I'll talk about Gradio UI.
I launch the UI using "uv run acestep". I uncheck initialize 5 HZ LM and select the SFT model. Device is set to auto and MLX Dit, Apple Silicon. The GPU tier is set to 24G VRAM. I initialize the service. After creating the dataset and the preprocessed tensors successfully, I open the Train LoRA TAB and the tensors are loaded. Next, I set rank to 32 and alpha to 64, with LoRA dropout at 0.1. LR is 0.0003, max epochs is 500, batch size is 4, gradient accumulation is 1, save every n epochs is 50, shift is 1, seed is 42, . Here is the log I get and training never pulls through. Here is the output I get.
* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
2026-02-16 16:36:21.623 | INFO     | acestep.handler:initialize_service:731 - [initialize_service] Attempting to load model with attention implementation: sdpa
`torch_dtype` is deprecated! Use `dtype` instead!
2026-02-16 16:36:33.109 | INFO     | acestep.handler:_init_mlx_dit:205 - [MLX-DiT] Native MLX DiT decoder initialized successfully (mx.compile=False).
2026-02-16 16:36:34.111 | INFO     | acestep.handler:_init_mlx_vae:278 - [MLX-VAE] Decode/encode compiled with mx.compile().
2026-02-16 16:36:34.111 | INFO     | acestep.handler:_init_mlx_vae:287 - [MLX-VAE] Native MLX VAE initialized (dtype=mlx.core.float32, compiled=True).
2026-02-16 16:40:15.147 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:137 - Training loader config: device=mps, workers=0, pin_memory=False, pin_memory_device=, persistent_workers=False
2026-02-16 16:40:16.348 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:193 - LoRA injected into DiT decoder:
2026-02-16 16:40:16.348 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:194 -   Total parameters: 2,415,892,614
2026-02-16 16:40:16.348 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:195 -   Trainable parameters: 22,020,096 (0.91%)
2026-02-16 16:40:16.348 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:196 -   LoRA rank: 32, alpha: 64
2026-02-16 16:40:16.348 | INFO     | acestep.training.trainer:__init__:381 - LoRA injected: 22,020,096 trainable params
2026-02-16 16:40:16.348 | WARNING  | acestep.training.trainer:__init__:393 - torch.compile is not available on this PyTorch version.
2026-02-16 16:40:16.350 | INFO     | acestep.training.lora_utils:_safe_enable_input_require_grads:136 - Skipping enable_input_require_grads for decoder: get_input_embeddings is not implemented (expected for DiT)
2026-02-16 16:40:16.354 | INFO     | acestep.training.trainer:train_from_preprocessed:579 - Training memory features: gradient_checkpointing=True, use_cache_disabled=True, input_grads_enabled=False
2026-02-16 16:40:16.355 | INFO     | acestep.training.data_module:__init__:90 - PreprocessedTensorDataset: 106 samples from /Users/louisbryant/Downloads/AI/ACE-Step-1.5/gradio_outputs/LoRAs/tensors
2026-02-16 16:40:16.355 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1s] Step 0: 📂 Loaded 106 preprocessed samples
2026-02-16 16:40:16.377 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1s] Step 0: 🧠 Gradient checkpointing enabled for decoder
2026-02-16 16:40:16.399 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1s] Step 0: ℹ️ Input-grad hook not available on this DiT; using explicit checkpointing fallback
Using 16-bit Automatic Mixed Precision (AMP)
2026-02-16 16:40:16.433 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1s] Step 0: 🚀 Starting training (device: mps, precision: 16-mixed)...
2026-02-16 16:40:16.462 | INFO     | acestep.training.trainer:_train_with_fabric:667 - Trainable tensor dtype fixup: casted 0/384 to fp32
2026-02-16 16:40:16.464 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1s] Step 0: 🎯 Training 22,020,096 parameters
2026-02-16 16:40:16.493 | INFO     | acestep.training.trainer:_train_with_fabric:739 - Optimizer param dtype fixup: casted 0/384 to fp32
2026-02-16 16:41:53.247 | INFO     | acestep.ui.gradio.events.training.lora_training:start_training:203 - [1m 38s] Step 0: ⚠️ Non-finite gradients (384/384); skipping optimizer step

The only thing that increases is the time it's been in that state. I tried changing all the mentioned parameters with no success.

When I try with sidestep, here is what I get. I use the same settings, but neither mode will train.
First, v2:
(acestep1.5) louisbryant@Louiss-Mac-mini ACE-Step-1.5 % uv run train.py
╭──────────────────────────────────────────────────────────────────────────────╮
│ ███████ ██ ██████  ███████                                                   │
│ ██      ██ ██   ██ ██                                                        │
│ ███████ ██ ██   ██ █████                                                     │
│      ██ ██ ██   ██ ██                                                        │
│ ███████ ██ ██████  ███████                                                   │
│                                                                              │
│ ███████ ████████ ███████ ██████                                              │
│ ██         ██    ██      ██   ██                                             │
│ ███████    ██    █████   ██████                                              │
│      ██    ██    ██      ██                                                  │
│ ███████    ██    ███████ ██                                                  │
│   "Je suis calibré."                                                         │
│                                                                              │
│   Side-Step v2.0.0 -- Adapter Fine-Tuning CLI (LoRA + LoKR)                  │
│   Standalone: github.com/koda-dernet/Side-Step                               │
│                                                                              │
│   Mode   : interactive                                                       │
│   Stack  : Python 3.11.13 | PyTorch 2.10.0                                   │
│   GPU    : Apple MPS                                                         │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

  What would you like to do?

    > 1. Train a LoRA (PEFT)  (default)
      2. Train a LoKR (LyCORIS)
      3. Preprocess audio into tensors
      4. Manage presets
      5. Settings (paths, vanilla mode)
      6. Experimental (beta)
      7. Exit

  Choice (1): 

  Which training mode?

    > 1. Corrected (recommended -- continuous timesteps + CFG dropout)  
(default)
      2. Vanilla (original behavior -- discrete timesteps, no CFG)
  Type 'b' to go back

  Choice: 1

  Load a preset?

    > 1. Start fresh (defaults)  (default)
      2. Louis -- Settings Louis approves of
      3. high_quality (built-in) -- High capacity, long training -- rank 128, 
1000 epochs
      4. quick_test (built-in) -- Low rank, few epochs -- fast iteration for 
testing
      5. recommended (built-in) -- Balanced defaults for most LoRA fine-tuning 
tasks
      6. vram_12gb (built-in) -- Tight (10-16 GB) -- RTX 3060 12GB, 4070, 4060 
Ti. 8-bit optimizer + encoder offloading.
      7. vram_16gb (built-in) -- Standard (16-24 GB) -- RTX 4080, 3080 Ti, 
A5000. Balanced rank and batch size.
      8. vram_24gb_plus (built-in) -- Comfortable (24 GB+) -- RTX 3090, 4090, 
A100, H100. High rank, batch 2, full speed.
      9. vram_8gb (built-in) -- Minimal (<10 GB) -- RTX 4060 8GB, 3050, GTX 
1080. Aggressive savings, low rank.
  Type 'b' to go back

  Choice: 2
  Loaded preset 'Louis'.


  How much do you want to configure?

    > 1. Basic (recommended defaults, fewer questions)  (default)
      2. Advanced (all settings exposed)
  Type 'b' to go back

  Choice: 1

  [Step 1/5] Required Settings

  --- Required Settings ---

  Checkpoint directory [./checkpoints]: 

  Select a model to train on

    > 1. acestep-v15-base  (official)  (default)
      2. acestep-v15-sft  (official)
      3. acestep-v15-turbo  (official)
      4. acestep-v15-turbo-continuous  (official)
      5. Search by name...
  Type 'b' to go back

  Choice: 2
  Dataset directory (preprocessed .pt files): /Users/louisbryant/Downloads/AI/ACE-Step-1.5/gradio_outputs/LoRAs/tensors
  Output directory for adapter weights: /Users/louisbryant/Downloads/AI/ACE-Step-1.5/gradio_outputs/LoRAs/weights

  [Step 2/5] LoRA Settings

  --- LoRA Settings (press Enter for defaults) ---

  Rank [32]: 
  Alpha [64]: 
  Dropout [0.1]: 

  Which attention layers to target?

    > 1. Both self-attention and cross-attention  (default)
      2. Self-attention only (audio patterns)
      3. Cross-attention only (text conditioning)
  Type 'b' to go back

  Choice: 1
  Target projections [q_proj k_proj v_proj o_proj]: <enter>

  [Step 3/5] Training Settings

  --- Training Settings (press Enter for defaults) ---

  Learning rate [0.0001]: 
  Batch size [4]: 
  Gradient accumulation [1]: 
  Max epochs [500]: 
  Warmup steps [100]: 
  Seed [42]: 
  Shift (turbo=3.0, base/sft=1.0) [1.0]: 
  Inference steps (turbo=8, base/sft=50) [50]: 

  [Step 4/5] Corrected Training Settings

  --- Corrected Training Settings (press Enter for defaults) ---

  CFG dropout ratio [0.15]: 

  [Step 5/5] Logging & Checkpoints

  --- Logging & Checkpoints (press Enter for defaults) ---

  Save checkpoint every N epochs [50]: 
  Log metrics every N steps [10]: 
  Resume from checkpoint path (leave empty to skip): 

  --- Save Preset ---

  Save these settings as a reusable preset? (yes/no) (yes): no
2026-02-16 16:52:00 [WARNING] torchao: Skipping import of cpp extensions due to incompatible torch version 2.10.0 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
2026-02-16 16:52:00 [WARNING] torchao.kernel.intmm: Warning: Detected no triton, on systems without Triton certain kernels will not work
W0216 16:52:00.853000 15232 .venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
2026-02-16 16:52:02 [ERROR] train: Unhandled error in session loop
Traceback (most recent call last):
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/train.py", line 133, in main
    last_code = _dispatch(args)
                ^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/train.py", line 101, in _dispatch
    from acestep.training_v2.cli.train_fixed import run_fixed
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/acestep/training_v2/cli/train_fixed.py", line 26, in <module>
    from acestep.training_v2.trainer_fixed import FixedLoRATrainer
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/acestep/training_v2/trainer_fixed.py", line 44, in <module>
    from acestep.training_v2.trainer_helpers import (
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/acestep/training_v2/trainer_helpers.py", line 19, in <module>
    from acestep.training.lora_utils import (
ImportError: cannot import name '_unwrap_decoder' from 'acestep.training.lora_utils' (/Users/louisbryant/Downloads/AI/ACE-Step-1.5/acestep/training/lora_utils.py)
[FAIL] cannot import name '_unwrap_decoder' from 'acestep.training.lora_utils' (/Users/louisbryant/Downloads/AI/ACE-Step-1.5/acestep/training/lora_utils.py)

And using the same settings but with vanilla simply freezes while showing the following output:
    > 1. Train a LoRA (PEFT)  (default)
      2. Train a LoKR (LyCORIS)
      3. Preprocess audio into tensors
      4. Manage presets
      5. Settings (paths, vanilla mode)
      6. Experimental (beta)
      7. Exit

  Choice (1): 

  Which training mode?

    > 1. Corrected (recommended -- continuous timesteps + CFG dropout)  
(default)
      2. Vanilla (original behavior -- discrete timesteps, no CFG)
  Type 'b' to go back

  Choice: 2

  Load a preset?

    > 1. Start fresh (defaults)  (default)
      2. Louis -- Settings Louis approves of
      3. high_quality (built-in) -- High capacity, long training -- rank 128, 
1000 epochs
      4. quick_test (built-in) -- Low rank, few epochs -- fast iteration for 
testing
      5. recommended (built-in) -- Balanced defaults for most LoRA fine-tuning 
tasks
      6. vram_12gb (built-in) -- Tight (10-16 GB) -- RTX 3060 12GB, 4070, 4060 
Ti. 8-bit optimizer + encoder offloading.
      7. vram_16gb (built-in) -- Standard (16-24 GB) -- RTX 4080, 3080 Ti, 
A5000. Balanced rank and batch size.
      8. vram_24gb_plus (built-in) -- Comfortable (24 GB+) -- RTX 3090, 4090, 
A100, H100. High rank, batch 2, full speed.
      9. vram_8gb (built-in) -- Minimal (<10 GB) -- RTX 4060 8GB, 3050, GTX 
1080. Aggressive savings, low rank.
  Type 'b' to go back

  Choice: 2
  Loaded preset 'Louis'.


  How much do you want to configure?

    > 1. Basic (recommended defaults, fewer questions)  (default)
      2. Advanced (all settings exposed)
  Type 'b' to go back

  Choice: 1

  [Step 1/4] Required Settings

  --- Required Settings ---

  Checkpoint directory [./checkpoints]: 

  Select a model to train on

    > 1. acestep-v15-base  (official)  (default)
      2. acestep-v15-sft  (official)
      3. acestep-v15-turbo  (official)
      4. acestep-v15-turbo-continuous  (official)
      5. Search by name...
  Type 'b' to go back

  Choice: 2
  Dataset directory (preprocessed .pt files): /Users/louisbryant/Downloads/AI/ACE-Step-1.5/gradio_outputs/LoRAs/tensors
  Output directory for adapter weights: /Users/louisbryant/Downloads/AI/ACE-Step-1.5/gradio_outputs/LoRAs/weights

  [Step 2/4] LoRA Settings

  --- LoRA Settings (press Enter for defaults) ---

  Rank [32]: 
  Alpha [64]: 
  Dropout [0.1]: 

  Which attention layers to target?

    > 1. Both self-attention and cross-attention  (default)
      2. Self-attention only (audio patterns)
      3. Cross-attention only (text conditioning)
  Type 'b' to go back

  Choice: 1
  Target projections [q_proj k_proj v_proj o_proj]: <enter>

  [Step 3/4] Training Settings

  --- Training Settings (press Enter for defaults) ---

  Learning rate [0.0001]: 
  Batch size [4]: 
  Gradient accumulation [1]: 
  Max epochs [500]: 
  Warmup steps [100]: 
  Seed [42]: 
  Shift (turbo=3.0, base/sft=1.0) [1.0]: 
  Inference steps (turbo=8, base/sft=50) [50]: 

  [Step 4/4] Logging & Checkpoints

  --- Logging & Checkpoints (press Enter for defaults) ---

  Save checkpoint every N epochs [50]: 
  Log metrics every N steps [10]: 
  Resume from checkpoint path (leave empty to skip): 

  --- Save Preset ---

  Save these settings as a reusable preset? (yes/no) (yes): no
[WARN] vanilla mode reproduces the EXISTING training behaviour which
         differs from the model's own training procedure:
           - Discrete 8-step turbo timesteps (should be continuous logit-normal)
           - No CFG dropout (should be cfg_ratio=0.15)
         These bugs affect ALL model variants (turbo, base, sft).
         Use 'fixed' for corrected training.
[INFO] Loading model (variant=acestep-v15-sft, device=mps)
2026-02-16 16:53:28 [INFO] acestep.training_v2.model_loader: [INFO] Loading model from checkpoints/acestep-v15-sft (variant=acestep-v15-sft, dtype=torch.float16)
[INFO] Loading model from checkpoints/acestep-v15-sft (variant=acestep-v15-sft, dtype=torch.float16)
[OK] Model loaded with attn_implementation=sdpa
2026-02-16 16:53:32 [INFO] acestep.training_v2.model_loader: [OK] Model on mps (torch.float16), all params frozen
╭──────────────────────── Side-Step Training Progress ─────────────────────────╮
│   Epoch 0 / 500    Step 0  |  ETA --                                         │
│                                                                              │
│   <Bar 0 of 100>  0%                                                         │
│                                                                              │
│ Loss                --                 Best               --                 │
│ LR                  --                 Speed              --                 │
│ Elapsed             0s                 Epoch time         --                 │
│                                                                              │
│   VRAM monitoring not available                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯2026-02-16 16:53:33.258 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:193 - LoRA injected into DiT decoder:
2026-02-16 16:53:33.258 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:194 -   Total parameters: 2,415,892,614
2026-02-16 16:53:33.258 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:195 -   Trainable parameters: 22,020,096 (0.91%)
2026-02-16 16:53:33.258 | INFO     | acestep.training.lora_utils:inject_lora_into_dit:196 -   LoRA rank: 32, alpha: 64
2026-02-16 16:53:33.258 | INFO     | acestep.training.trainer:__init__:381 - LoR/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/
torch/utils/data/dataloader.py:1118: UserWarning: 'pin_memory' argument is set 
as true but not supported on MPS now, device pinned memory won't be used.
  super().__init__(loader)
╭──────────────────────── Side-Step Training Progress ─────────────────────────╮
│   Epoch 0 / 500    Step 0  |  ETA --                                         │
│                                                                              │
│   <Bar 0 of 100>  0%                                                         │
│                                                                              │
│ Loss                --                 Best               --                 │
│ LR                  --                 Speed              --                 │
│ Elapsed             0s                 Epoch time         --                 │
│                                                                              │
│   VRAM monitoring not available                                              │
│                                                                              │
│   🧠 Gradient checkpointing enabled for decoder                              │
│   ℹ️ Input-grad hook not available on this DiT; using explicit checkpointing │
│ fallback                                                                     │
│   Using 16-bit Automatic Mixed Precision (AMP)                               │
│   🚀 Starting training (device: mps, precision: 16-mixed)...                 │
│   🎯 Training 22,020,096 parameters                                          │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯2026-02-16 16:53:35 [WARNING] torchao: Skipping import of cpp extensions due to incompatible torch version 2.10.0 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
2026-02-16 16:53:35 [WARNING] torchao: Skipping import of cpp extensions due to incompatible torch version 2.10.0 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
2026-02-16 16:53:35 [WARNING] torchao: Skipping import of cpp extensions due to incompatible torch version 2.10.0 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
2026-02-16 16:53:35 [WARNING] torchao: Skipping import of cpp extensions due to incompatible torch version 2.10.0 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
2026-02-16 16:53:35 [WARNING] torchao.kernel.intmm: Warning: Detected no triton, on systems without Triton certain kernels will not work
╭──────────────────────── Side-Step Training Progress ─────────────────────────╮
│   Epoch 0 / 500    Step 0  |  ETA --                                         │
│                                                                              │
│   <Bar 0 of 100>  0%                                                         │
│                                                                              │
│ Loss                --                 Best               --                 │
│ LR                  --                 Speed              --                 │
│ Elapsed             0s                 Epoch time         --                 │
│                                                                              │
│   VRAM monitoring not available                                              │
│                                                                              │
│   🧠 Gradient checkpointing enabled for decoder                              │
│   ℹ️ Input-grad hook not available on this DiT; using explicit checkpointing │
│ fallback                                                                     │
│   Using 16-bit Automatic Mixed Precision (AMP)                               │
│   🚀 Starting training (device: mps, precision: 16-mixed)...                 │
│   🎯 Training 22,020,096 parameters                                          │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/pythRuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x107c95988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x107c925e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x107b17f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x107b184f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x109bd1a98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x102cad7b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #7: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #9: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x102c6364c in Python)
frame #11: _Pickle_FastCall + 20 (0x102659368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x10265bcb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x10265f128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x10265bcdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x10265f128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x10265bcdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x10265cba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x10265b2a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x10265b04c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x102c6f6e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x102d3c7bc in Python)
frame #25: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #27: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #28: method_vectorcall + 380 (0x102c65c28 in Python)
frame #29: thread_run + 168 (0x102df7188 in Python)
frame #30: pythread_wrapper + 48 (0x102d983e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

on@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    o  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    retubj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x1079d1988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x1079ce5e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x107853f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1078544f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x10990da98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x102bf97b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x102c865e0 in Python)
frame #7: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x102c865e0 in Python)
frame #9: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x102baf64c in Python)
frame #11: _Pickle_FastCall + 20 (0x1025b5368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x1025b7cb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x1025b7b5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x1025bb128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x1025b7cdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x1025b7b5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x1025bb128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x1025rn super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
b7cdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x1025b8ba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x1025b7b5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x1025b72a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x1025b704c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x102bbb6e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x102c887bc in Python)
frame #25: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x102c865e0 in Python)
frame #27: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #28: method_vectorcall + 380 (0x102bb1c28 in Python)
frame #29: thread_run + 168 (0x102d43188 in Python)
frame #30: pythread_wrapper + 48 (0x102ce43e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x105b75988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x105b725e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x1059f7f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1059f84f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x107ac9a98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x100bf97b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #7: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #9: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x100baf64c in Python)
frame #11: _Pickle_FastCall + 20 (0x1005a1368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x1005a3cb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x1005a7128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x1005a3cdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x1005a7128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x1005a3cdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x1005a4ba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x1005a32a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x1005a304c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x100bbb6e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x100c887bc in Python)
frame #25: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #27: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #28: method_vectorcall + 380 (0x100bb1c28 in Python)
frame #29: thread_run + 168 (0x100d43188 in Python)
frame #30: pythread_wrapper + 48 (0x100ce43e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x105a9d988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x105a9a5e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x10591ff04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1059204f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x1079d9a98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x100ac57b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #7: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #9: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x100a7b64c in Python)
frame #11: _Pickle_FastCall + 20 (0x100469368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x10046bcb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x10046f128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x10046bcdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x10046f128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x10046bcdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x10046cba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x10046b2a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x10046b04c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x100a876e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x100b547bc in Python)
frame #25: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #27: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #28: method_vectorcall + 380 (0x100a7dc28 in Python)
frame #29: thread_run + 168 (0x100c0f188 in Python)
frame #30: pythread_wrapper + 48 (0x100bb03e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 52Traceback (most recent call last):
Traceback (most recent call last):
8, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
 fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/DownRuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x105a9d988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x105a9a5e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x10591ff04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1059204f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x1079d9a98 loads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x105b75988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x105b725e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x1059f7f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1059f84f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x107ac9a98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x100bf97b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #7: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #9: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x100baf64c in Python)
frame #11: _Pickle_FastCall + 20 (0x1005a1368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x1005a3cb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x1005a7128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x1005a3cdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x1005a7128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x1005a3cdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x1005a4ba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x1005a3b5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x1005a32a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x1005a304c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x100bbb6e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x100c887bc in Python)
frame #25: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x100c865e0 in Python)
frame #27: _PyEval_Vector + 116 (0x100c90400 in Python)
frame #28: method_vectorcall + 380 (0x100bb1c28 in Python)
frame #29: thread_run + 168 (0x100d43188 in Python)
frame #30: pythread_wrapper + 48 (0x100ce43e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

Traceback (most recent call last):
in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x100ac57b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #7: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #9: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x100a7b64c in Python)
frame #11: _Pickle_FastCall + 20 (0x100469368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x10046bcb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x10046f128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x10046bcdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x10046f128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x10046bcdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x10046cba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x10046bb5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x10046b2a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x10046b04c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x100a876e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x100b547bc in Python)
frame #25: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x100b525e0 in Python)
frame #27: _PyEval_Vector + 116 (0x100b5c400 in Python)
frame #28: method_vectorcall + 380 (0x100a7dc28 in Python)
frame #29: thread_run + 168 (0x100c0f188 in Python)
frame #30: pythread_wrapper + 48 (0x100bb03e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x107c95988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x107c925e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x107b17f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x107b184f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x109bd1a98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x102cad7b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #7: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #9: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x102c6364c in Python)
frame #11: _Pickle_FastCall + 20 (0x102659368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x10265bcb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x10265f128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x10265bcdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #17: save_reduce + 428 (0x10265f128 in _pickle.cpython-311-darwin.so)
frame #18: save + 1512 (0x10265bcdc in _pickle.cpython-311-darwin.so)
frame #19: save + 5300 (0x10265cba8 in _pickle.cpython-311-darwin.so)
frame #20: save + 1128 (0x10265bb5c in _pickle.cpython-311-darwin.so)
frame #21: dump + 188 (0x10265b2a0 in _pickle.cpython-311-darwin.so)
frame #22: _pickle_Pickler_dump + 56 (0x10265b04c in _pickle.cpython-311-darwin.so)
frame #23: method_vectorcall_O + 100 (0x102c6f6e4 in Python)
frame #24: _PyEval_EvalFrameDefault + 21140 (0x102d3c7bc in Python)
frame #25: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #26: _PyEval_EvalFrameDefault + 12472 (0x102d3a5e0 in Python)
frame #27: _PyEval_Vector + 116 (0x102d44400 in Python)
frame #28: method_vectorcall + 380 (0x102c65c28 in Python)
frame #29: thread_run + 168 (0x102df7188 in Python)
frame #30: pythread_wrapper + 48 (0x102d983e8 in Python)
frame #31: _pthread_start + 136 (0x19e215c08 in libsystem_pthread.dylib)
frame #32: thread_start + 8 (0x19e210ba8 in libsystem_pthread.dylib)

  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.13/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/multiprocessing/reductions.py", line 604, in reduce_storage
    metadata = storage._share_filename_cpu_()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 449, in wrapper
    return fn(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/storage.py", line 528, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch_shm_manager at "/Users/louisbryant/Downloads/AI/ACE-Step-1.5/.venv/lib/python3.11/site-packages/torch/bin/torch_shm_manager": execl failed: Permission denied
Exception raised from start_manager at /Users/runner/work/pytorch/pytorch/pytorch/torch/lib/libshm/core.cpp:67 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x1079d1988 in libc10.dylib)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x1079ce5e4 in libc10.dylib)
frame #2: THManagedMapAllocatorInit::THManagedMapAllocatorInit(char const*, char const*) + 1760 (0x107853f04 in libshm.dylib)
frame #3: THManagedMapAllocator::makeDataPtr(char const*, char const*, int, unsigned long) + 76 (0x1078544f8 in libshm.dylib)
frame #4: THPStorage_shareFilename(_object*, _object*) + 144 (0x10990da98 in libtorch_python.dylib)
frame #5: cfunction_vectorcall_NOARGS + 88 (0x102bf97b4 in Python)
frame #6: _PyEval_EvalFrameDefault + 12472 (0x102c865e0 in Python)
frame #7: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #8: _PyEval_EvalFrameDefault + 12472 (0x102c865e0 in Python)
frame #9: _PyEval_Vector + 116 (0x102c90400 in Python)
frame #10: PyObject_CallOneArg + 96 (0x102baf64c in Python)
frame #11: _Pickle_FastCall + 20 (0x1025b5368 in _pickle.cpython-311-darwin.so)
frame #12: save + 1472 (0x1025b7cb4 in _pickle.cpython-311-darwin.so)
frame #13: save + 1128 (0x1025b7b5c in _pickle.cpython-311-darwin.so)
frame #14: save_reduce + 428 (0x1025bb128 in _pickle.cpython-311-darwin.so)
frame #15: save + 1512 (0x1025b7cdc in _pickle.cpython-311-darwin.so)
frame #16: save + 1128 (0x1025b7b5c in _pickle.cpython-311-darwin.so)
╭──────────────────────── Side-Step Training Progress ─────────────────────────╮
│   Epoch 0 / 500    Step 0  |  ETA --                                         │
│                                                                              │
│   <Bar 0 of 100>  0%                                                         │
│                                                                              │
│ Loss                --                 Best               --                 │
│ LR                  --                 Speed              --                 │
│ Elapsed             0s                 Epoch time         --                 │
│                                                                              │
│   VRAM monitoring not available                                              │
│                                                                              │
│   🧠 Gradient checkpointing enabled for decoder                              │
│   ℹ️ Input-grad hook not available on this DiT; using explicit checkpointing │
│ fallback                                                                     │
│   Using 16-bit Automatic Mixed Precision (AMP)                               │
│   🚀 Starting training (device: mps, precision: 16-mixed)...                 │
│   🎯 Training 22,020,096 parameters                                          │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
and the sidestep freezes. If I press control+C, it returns to the Main Menu.
I'm on MAC OS Tahoe 26.2, MAC Mini, 2024 #@GB RAM.
Changing LoRA or running as superuser didn't help either. Hope these logs help. THanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Train a LoRA on MPS/MLX #619

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to Train a LoRA on MPS/MLX #619

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions