-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Striping the restart.mesh.data file #6035
Comments
The It is possible that one could do this more efficiently if one could tell the operating system to "stripe" the file, i.e., to store it not as one big block on one disk, but as many blocks on many disks at the same time. But, at least to my understanding, telling the operating system to do this is not part of the regular C/C++ interface used to create/open files. How would one set the striping property on a new file? Would you or Chris be able to offer a piece of code we could use to do this, and do it in a way so that it is as portable on all systems? |
@bangerth Thank you for your reply. Your input will greatly enhance my upcoming discussion with Chris. I must admit, I don't have extensive knowledge on this particular issue, so I will be relying on both you and Chris to help address it. Resolving this matter is currently a high priority for me, especially as we have a 20,000 SUs allocation on Frontera to run our 3D models. Our ongoing simulations are utilizing over 1,000 cores, and we've thoroughly tested the memory and computational requirements to ensure they meet the demands of our 3D cases. As such, I’m counting on this issue being resolved soon. I'll reach out to Chris to arrange a follow-up discussion during our next meeting, and I will keep you updated on the progress. |
I should also mention that I am currently using ASPECT version 9.4.0 on deal.II 9.4.0. If the MPI I/O functionality differs in newer versions of ASPECT or deal.II, I would greatly appreciate any clarification on that. |
As Wolfgang mentioned you are probably thinking about "striping" a file, and striping on Frontera is handled by the operating system, see here in the Frontera manual: https://docs.tacc.utexas.edu/hpc/frontera/#files-striping. We have in the past successfully checkpointed and restarted models up to several thousand CPUs on Frontera, so I think this is something you will have to resolve with the Frontera staff first, and only if there is no solution that can be implemented on the cluster we could start to think about mitigation measures inside ASPECT, which would almost certainly be more complicated than a more appropriate configuration of the cluster filesystem. |
As René mentioned, checkpointing doesn’t seem to be a problem, and it works fine for my case with 4000 CPUs. However, Chris reached out to inform me that my checkpoint behavior is causing some issues with the Frontera system. He suspects that the problem lies with the restart.mesh.data file and suggested the number 840, which is lower than the number of CPUs I typically use. He also proposed that stripping the file might be a solution. At this point, I believe it would be helpful to have a brief discussion to identify the exact issue and explore potential solutions. |
I wanted to provide an update on the debugging process related to file stripping on Frontera’s system. After some testing, I found that with 4000 MPI processes and three large restart files, each around 100 GB in size, setting the file stripping number to 8 appears to be optimal. It turns out that file stripping can be specifically configured for these three files. I'll let you know how my test goes. |
Thank you for the update 👍 when you conclude your analysis, would you be willing to summarize your findings in a bullet point or a few sentences in our wiki section on Frontera: https://github.com/geodynamics/aspect/wiki/Installation-on-Frontera? That would help others in the future when they encounter similar issues. |
Of course. I'll keep that in mind. |
Are you talking about the One issue I can see is that we create aspect/source/simulator/checkpoint_restart.cc Line 320 in 91264fd
This means the actual IO happens into different files. Maybe we should put all large restart files into a separate folder. This way one can specify the striping behavior for the whole folder (and consequently for each new file created there). |
That makes a lot of sense. My current solution is to set striping for all the following files: restart.mesh_fixed.data, restart.mesh.new_fixed.data, and restart.mesh_fixed.data.old. Grouping these files in a dedicated folder would make the process much easier to manage. In practice, this would just require adding one additional line in the SLURM file like:
This should help streamline the setup. Let me know if you have any thoughts or further suggestions! |
Hi Timo, As for now, my test case with 4000 nodes and a restart.mesh_fixed.data file of 100 GB are operating well on Frontera after adding the restart.mesh.new_fixed.data file. Looking forward to your guidance! |
Hi all.
(Chris Ramos forwarded)
I am running a large 3d model on Frontera with 4000 cpus. A few days later I am notified of this big file, namely the restart.mesh.data.
The problem is all processors are trying to write on this one simultaneously. I take that the situation is more like a traffic jam.
Chris, the Frontera administrator suggests applying the stripping technique on this big restart file. Otherwise, the total number of tasks cannot exceed 840, according to his estimation based on the system capacity.
A further point I learned from Chris is reducing the model size, and therefore reducing the data size won't help much. It's mainly the number of simultaneous checkpointing processes that matters.
I'd like to know whether there is a ready feature in aspect? Or how to address this issue otherwise?
The text was updated successfully, but these errors were encountered: