Skip to content

Scheduler experienced prolonged stall after midnight #74

@eddiewang927

Description

@eddiewang927

Description

Hi,
I would like to ask for your insights on an intermittent issue we are seeing in our cluster.

Every day, within a very specific time window 00:05–00:11, any client command such as:

  • qsub

  • qrsh

will randomly fail and return:


failed receiving gdi request response for mid=1 (got syncron message receive timeout error)

After 00:11, everything immediately returns to normal.

If you have any ideas—possible internal tasks running at that time, known behaviors, or pointers to where we should investigate—your suggestions would be greatly appreciated.

Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions