-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory errors when running ED-2.2 #327
Comments
The error message suggests there is a floating point exception due to division by zero but there was no clue where the error happened in the model. I think the only way is to reproduce the error (e.g. restart from the closest -S- file and run with an executable compiled in debugging model, such as -k C). It will be much easier to identify the problem once we know the code that caused the error. As for memory usage, ED2 is not really memory intensive. For a single one-site simulation with <1000 cohorts, it only takes me less than 1GB of memory. |
Thanks @xiangtaoxu. I'll talk to the team. |
Although by looking at the log file, it seems that the simulation reached the end, at least based on this message:
In any case, I agree that it may be good to run the simulation with debugging options to see if the error messages appearing in the end are caused by some floating point exception in ED2 for your settings. Also, I noticed that you are using XML to set parameters, so maybe it is worth checking that all parameters needed to redefine the PFTs are set (some other ED2 folks more familiar with the XML interface may be able to give more up-to-date insights). |
Thank you @mpaiao. Hopefully somebody familiar with the XML could shine light on this. |
@julianpistorius I saw your email on the warning messages in utils_c.c, but I don't see it on the GitHub issue. In any case, I normally wouldn't advise to ignore warnings, but utils_c.c is a legacy set of ancillary functions borrowed from BRAMS (atmospheric model) that no one is really developing in ED2. If the model compilation doesn't give errors, then I would ignore these warnings (different story if the warning messages are showing in the fortran code). |
Thank you @mpaiao! Yes I deleted the message, because I realized they were just warnings, and didn't prevent the binary from being created. I now have a Docker image with the debug version of ED2: https://hub.docker.com/r/jpistorius/model-ed2-2.2.0 I turned that into a Singularity image. Unfortunately when I tried to run it on our one HPC system here at Arizona it failed:
The kernel is a bit old:
Now I'm trying to run it on our other HPC system which at least has a 3.x series kernel:
If that still doesn't work I'm going to use a large OpenStack virtual machine (or bare metal node) with a recent kernel, just to get the debug output. Will update here with progress. Update: The Singularity image with the debug binary is working on the other HPC cluster. Running now. Will hopefully soon have more useful error messages. |
I'll have to try compiling ED2 again. The output I got is actually less useful than what I had before:
|
@julianpistorius Double check that you have the trace back option enabled when compiling the code (-fbacktrace if you are using gfortran, -traceback if using ifort). In case you already have this option, then I suspect that the error messages are coming from the HDF5 (!), not ED2, at least based on your first post. |
@mpaiao I did not have This is my install.sh command: Here's my
|
Got an error:
|
@julianpistorius It is interesting to have an out of bound error... It seems kapartial is zero, likely caused by a zero value in ncanlyr? I do not have time right now to track the calculation of ncanlyr but my guess is some inappropriate parameter in XML cascade into calculations of ncanlyr... Will check later.. |
From the files shared, it doesn't look like ncanlyr was changed in the xml file, so the default value should be 100. I don't see how kapartial could be zero, other than because of some very strange round error that is making the Otherwise, good-old print statements may help, add this temporary chunk of code right after line 1090 (which defines kzfull):
and run the code again. This should print all the information needed to understand what is happening. |
Great! Thanks @xiangtaoxu & @mpaiao. That helps a lot. I'll let you know what happens. |
@xiangtaoxu & @mpaiao - this is what the output was:
|
I'm guessing |
Htopcrown is zero because this is a strange singularity. Cohorts should never have zero height or dbh. I don't know how this is happening. My first suggestion would be to "hide" config.xml and run the model with the default parameters, to see if the problem persists. |
@julianpistorius yes. htopcrown becomes zero, which causes the problem since <init_density>0.1000000015</init_density> I would suggest you plot your height and biomass allometry offline to see what might be wrong. |
Agree with Marcos |
Thank you for your suggestions. I'm going to ask one of my team members to try to plot the height and biomass allometry, while I try the following:
I'm not sure what's the minimum required configuration for ED2. What's the best way to do this? Some options I could think of, ranging from most extreme to least extreme: a. Rename |
You can move |
Update: I moved the Will update here if it finishes successfully, or crashes with some other interesting error. |
The run completed successfully with the default parameters. Thank you both very much. I'll work with my colleagues on figuring out what in the |
@mpaiao are the default parameter values for ED2 documented somewhere? Maybe in the repo? |
All the default parameter values in ED2 are assigned in ed_params.f90. I also included most of them in the supporting information of our GMD paper. However, as new features are added in the model, the GMD tables will become less comprehensive over time. At least with the most up-to-date version, if a variable is assigned through xml then xml has the last word. Otherwise, then ED2 will use the values from xml. If I remember correctly, @femeunier changed the code a few months ago to always print an xml file with all the parameters, so if you don't provide an xml, ED2 will generate one, which may be useful to compare with the differences between your settings and the default one. |
This issue can be closed |
Running it on University of Arizona HPC, using a Singularity container.
Full logfile.txt, and scripts used to run the job: https://gist.github.com/julianpistorius/e120e6d573f68f5fea0f4bf3dce2bd1b
The HPC consulting team suggested using more memory for the job (I used 12 GB most recently, and currently have a job running with 120 GB).
The text was updated successfully, but these errors were encountered: