Open
Description
When submitting jobs through slurm, the AMD nodes we've specified in limits.yml are not automatically activating. We then follow the instructions on the elastic scaling page to manually call up a node and receive the error:
2019-06-10 11:29:37,108 startnode ERROR bm-standard-e2-64-ad1-0003: problem launching instance: {'opc-request-id': 'E3D3A2D1DEB14B9C84CBB7FD6F2CA7B3/90862EA821FF46290B355B89CAE3A926/B4D05FEBD75749F988B1C201434A2A1C', 'code': 'InternalError', 'message': 'Out of host capacity.', 'status': 500}
After trying to launch three node instances, we also get this error:
2019-06-10 11:32:07,976 startnode ERROR bm-standard-e2-64-ad1-0001: problem launching instance: {'opc-request-id': '07BF5FF7021E4BD5B7580DF99C44D23F/F122163E1641F3594B94D09F1EB83A9E/5077118583A2A8872A52AF2492160373', 'code': 'TooManyRequests', 'message': 'Too many requests for the user', 'status': 429}
We've actually had one success in activating a node following this approach, but can't figure out why it worked in that particular case but not in others. Otherwise we are well below the node limit on our given AD. Any ideas?
Metadata
Metadata
Assignees
Labels
No labels