You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did several test for a large system (728 atoms) with lcao method. I used the same total number of cores (24/48/72) but different numbers of nodes for 3 set of tests.
Environment: NO openmpi; NO hyperthreading
Queue 1: 24 cores and 64GB memory for each node
In this case some calculation could not be done because of lack of memory (labeled as NaN).
total # of cores:
# of cores per node
Total time (s)
Memory (MB)
24
6
1073.3
3.298e+04
24
8
977.26
2.715e+04
24
12
1356.0
2.132e+04
24
24
13677.
1.549e+04
48
6
769.83
3.290e+04
48
8
1230.2
2.707e+04
48
12
1806.7
2.124e+04
48
24
NaN
NaN
72
6
563.77
3.290e+04
72
8
666.44
2.707e+04
72
12
1086.8
2.124e+04
72
24
NaN
NaN
Problems and special notes
For all 3 set of tests, the "MEMORY" written in the output log can not correctly describe the requirement of memory, although the tendency should be okay. The "24 cores per node" tests are failed because of out-of-memory error.
Since 2.2.3, a calculation would killed if it requires too much memory (it is good !). Fortunately, the 24 core in 1 node test escaped from that limitation. Among the stucked calculations, I got a snapshot of system load. The nodes for a calculation can be divided as one 'main node' with several (or no) 'calculation nodes'. The 'calculation' nodes seemed 'healthy' without any memory problem:
Each thread here takes up similar memory (~1.2G) for all the tests.
However, the 'main node' of the 24 cores/node tests and some 12 cores/node tests contains several heavy jobs:
These fat threads causes the memory jamming on the main node.
It is also very strange that the calculation runs faster if the cores span over more nodes. One possible hypothesis is that the memory allocation problem caused the blocking of memory channels. Generally, parallel jobs across nodes is less efficiency because of the cost of communication across nodes. This result is quite interesting.
Queue 2: 24 cores and 128GB memory for each node
To check if the the memory limitation causes the problem 3 in previous queue. The similar tests were performed on a queue with larger memory on each node (128G).
total # of cores:
# of cores per node
Total time (s)
Memory (MB)
24
6
1539.5
1.549e+04
24
8
1523.0
1.549e+04
24
12
1461.1
1.549e+04
24
24
1419.9
1.549e+04
48
6
818.20
1.541e+04
48
8
891.99
1.541e+04
48
12
932.74
1.541e+04
48
24
949.15
1.541e+04
72
8
745.72
1.541e+04
72
12
861.75
1.541e+04
72
24
853.25
1.541e+04
The '24 cores in total' tests seems fine. But the rest two sets shows similar trend as Queue 1.
Sugguestions
The memory allocation mechanism across nodes could be improved.
The memory cost (maximal and averaged) should be printed.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Background
I did several test for a large system (728 atoms) with lcao method. I used the same total number of cores (24/48/72) but different numbers of nodes for 3 set of tests.
Environment: NO openmpi; NO hyperthreading
Queue 1: 24 cores and 64GB memory for each node
In this case some calculation could not be done because of lack of memory (labeled as NaN).
Problems and special notes
For all 3 set of tests, the "MEMORY" written in the output log can not correctly describe the requirement of memory, although the tendency should be okay. The "24 cores per node" tests are failed because of out-of-memory error.
Since 2.2.3, a calculation would killed if it requires too much memory (it is good !). Fortunately, the 24 core in 1 node test escaped from that limitation. Among the stucked calculations, I got a snapshot of system load. The nodes for a calculation can be divided as one 'main node' with several (or no) 'calculation nodes'. The 'calculation' nodes seemed 'healthy' without any memory problem:
Each thread here takes up similar memory (~1.2G) for all the tests.
However, the 'main node' of the 24 cores/node tests and some 12 cores/node tests contains several heavy jobs:
These fat threads causes the memory jamming on the main node.
It is also very strange that the calculation runs faster if the cores span over more nodes. One possible hypothesis is that the memory allocation problem caused the blocking of memory channels. Generally, parallel jobs across nodes is less efficiency because of the cost of communication across nodes. This result is quite interesting.
Queue 2: 24 cores and 128GB memory for each node
To check if the the memory limitation causes the problem 3 in previous queue. The similar tests were performed on a queue with larger memory on each node (128G).
The '24 cores in total' tests seems fine. But the rest two sets shows similar trend as Queue 1.
Sugguestions
The input and log files can be found here:
queue1.zip
queue2.zip
Beta Was this translation helpful? Give feedback.
All reactions