You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 6, 2023. It is now read-only.
I have the following error when I try to run my code with torchelastic:
Creating EtcdStore as the c10d::Store implementation
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torchelastic/distributed/launch.py", line 531, in main
run_result = elastic_agent.run(spec.role)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
result = f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 680, in run
result = self._invoke_run(role)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 802, in _invoke_run
self._initialize_workers(self._worker_group)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
result = f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 654, in _initialize_workers
self._rendezvous(worker_group)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/metrics/api.py", line 126, in wrapper
result = f(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/agent/server/api.py", line 518, in _rendezvous
store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 157, in next_rendezvous
store = self._rdzv_impl.setup_kv_store(rdzv_version)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 975, in setup_kv_store
return EtcdStore(etcd_client=self.client, etcd_store_prefix=store_path)
File "/opt/conda/lib/python3.8/site-packages/torchelastic/rendezvous/etcd_rendezvous.py", line 997, in __init__
self.timeout = (
AttributeError: can't set attribute
Steps to reproduce:
>>> from torch.distributed import Store
>>> class A(Store):
... def __init__(self):
... super().__init__()
... self.timeout = 1
...
>>> a = A()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
AttributeError: can't set attribute
I've tried several different versions of torch and torchelastic (latest stable included) but nothing happened, error is still here. Can you help me please, what does this error mean ? How I can fix it ?
Hi,
I have the following error when I try to run my code with torchelastic:
Steps to reproduce:
I've tried several different versions of torch and torchelastic (latest stable included) but nothing happened, error is still here. Can you help me please, what does this error mean ? How I can fix it ?
os centos 7
python python3.8.3
torch 1.9.0
torchelastic 0.2.2
python-etcd 0.4.5
The text was updated successfully, but these errors were encountered: