Unusable nodes #328

simon-tarr · 2018-11-22T09:34:15Z

Hello,

Another instance of batch pool creation unreliability, I'm afraid. Over the past 48 hours I've been unable to start any low-priority nodes using a docker image which has been fine the previous 5 days.

I'm trying to boot 8 x E64s_v3 low priority nodes. 7 will start successfully and be sat idle. One will always have the status "unusable". Every. Single. Time. I must have attempted to boot 50+ pools over the past 48 hours. I have also tried booting a few decidated nodes (only 2 or 3) and get the same issue. While booting I have also kept an eye on the node status graphs to see if nodes are being pre-empted during the boot process, which could possibly prevent a node from successfully booting. Unfortunately I've seen nothing out of the ordinary which could lead me to believe that would be an issue. I have also tried creating same size and smaller pools using different VM classes (F64s_v2, D64s_v3) with the same result. Note that I am using resource files during pool creation.

Because the node is unusable, there are no files/logs for me to view, so I can't troubleshoot the issue. If I use Batch Explorer to look what's going on, I can locate the unusable node but upon clicking it just says: "Node is currently 'unusable', there are no files to view now". I cannot reboot the node because I get a red popup warning (top right of batch explorer) which says: "Reboot failed".

As I say, everything was working fine, now it isn't. Nothing has changed my end in terms of configuring my pool, or the docker image (arcalis/nichemapr) that I've been using without issue until 2 days ago.

Thanks,
Simon

brnleehng · 2018-11-26T18:48:22Z

Hi @simon-tarr

I'll check if there's been any changes on the service side. There could have been a new deployment.

If you are on Batch Explorer, you can upload the Batch node agent logs to your Azure storage container.
The node agent logs will contain useful information about the VM and its status with the Batch service.

Pool > Node > Upload Batch logs to Storage:
Here's an image for uploading your node agent logs:

If you can share the node agent logs information through email (razurebatch@microsoft.com), that'll be great for diagnostic for us.

Can I get the region, pool name, and time of occurrence?

Thanks!
Brian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unusable nodes #328

Unusable nodes #328

simon-tarr commented Nov 22, 2018

brnleehng commented Nov 26, 2018

Unusable nodes #328

Unusable nodes #328

Comments

simon-tarr commented Nov 22, 2018

brnleehng commented Nov 26, 2018