Training a sdxl image to text model, but getting following error #8706
preethamp0197
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Map: 49%|████▉ | 977000/1985039 [5:01:52<5:08:01, 54.54 examples/s]
Map: 50%|█████ | 993000/1985039 [5:01:56<4:58:49, 55.33 examples/s]
Map: 50%|█████ | 994000/1985039 [5:01:56<5:05:28, 54.07 examples/s]
Map: 51%|█████ | 1017000/1985039 [5:01:56<4:45:43, 56.47 examples/s]
Map: 49%|████▉ | 975000/1985039 [5:01:56<5:11:09, 54.10 examples/s]
Map: 50%|█████ | 1000000/1985039 [5:01:57<5:05:26, 53.75 examples/s]
Map: 50%|█████ | 994000/1985039 [5:01:58<4:56:14, 55.76 examples/s]
Map: 51%|█████ | 1009000/1985039 [5:01:58<4:57:23, 54.70 examples/s]
Map: 49%|████▉ | 976000/1985039 [5:02:01<5:11:59, 53.90 examples/s]
Map: 51%|█████ | 1010000/1985039 [5:02:01<4:55:56, 54.91 examples/s]
Map: 50%|█████ | 1001000/1985039 [5:02:04<5:01:38, 54.37 examples/s]
Map: 50%|█████ | 994000/1985039 [5:02:07<5:05:28, 54.07 examples/s]
Map: 51%|█████ | 1017000/1985039 [5:02:08<4:45:43, 56.47 examples/s]
Map: 49%|████▉ | 977000/1985039 [5:02:09<5:08:01, 54.54 examples/s]
Map: 49%|████▉ | 978000/1985039 [5:02:10<5:07:39, 54.55 examples/s]
Map: 50%|█████ | 994000/1985039 [5:02:10<4:56:14, 55.76 examples/s]
Map: 50%|█████ | 995000/1985039 [5:02:14<5:01:36, 54.71 examples/s]
Map: 51%|█████▏ | 1018000/1985039 [5:02:14<4:46:07, 56.33 examples/s][2024-06-25 13:00:06,795] torch.distributed.elastic.agent.server.api: [WARNING] Received Signals.SIGTERM death signal, shutting down workers
[2024-06-25 13:00:06,796] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 75 closing signal SIGTERM
[2024-06-25 13:00:06,796] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 76 closing signal SIGTERM
[2024-06-25 13:00:06,797] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 77 closing signal SIGTERM
[2024-06-25 13:00:06,797] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 78 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 79 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 80 closing signal SIGTERM
[2024-06-25 13:00:06,798] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 81 closing signal SIGTERM
[2024-06-25 13:00:06,799] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 82 closing signal SIGTERM
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 1073, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
result = agent.run()
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/metrics/api.py", line 123, in wrapper
result = f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/agent/server/api.py", line 727, in run
result = self._invoke_run(role)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/agent/server/api.py", line 868, in _invoke_run
time.sleep(monitor_interval)
File "/usr/local/lib/python3.9/dist-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 1 got signal: 15
Please help me solve this issue
Beta Was this translation helpful? Give feedback.
All reactions