-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests working on gadi #36
Comments
@nichannah I get a segfault running tests with a9e2883
yields
|
This is the sort of message that I was getting in my ports of CM4 etc. My guess is that a temp array is being created for |
Thanks - I just tried |
Try just |
I just tried |
Seems to be working fine for me on express queue without having to invoke Hang on it's just crashed at the end in the ocean stub with a heap of warnings like
` 1 0x0000000000051a36 ucs_fatal_error_format() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/debug/assert.c:52 2 0x00000000000562f0 ucs_mem_region_destroy_internal() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/memory/rcache.c:200 3 0x000000000005c6c6 ucs_class_call_cleanup_chain() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/type/class.c:52 4 0x0000000000056f38 ucs_rcache_destroy() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucs/../../../src/ucs/memory/rcache.c:729 5 0x00000000000030f2 uct_knem_md_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/uct/sm/knem/../../../../../src/uct/sm/knem/knem_md.c:91 7 0x000000000000f1c9 ucp_cleanup() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/ucx/1.6.1/source/ucx-1.6.1/build/src/ucp/../../../src/ucp/core/ucp_context.c:1266 8 0x0000000000005bcc mca_pml_ucx_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:247 9 0x0000000000007909 mca_pml_ucx_component_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx_component.c:82 10 0x00000000000582b9 mca_base_component_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:53 11 0x0000000000058345 mca_base_components_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:85 12 0x0000000000058345 mca_base_components_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_components_close.c:86 13 0x00000000000621da mca_base_framework_close() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/opal/mca/base/../../../../opal/mca/base/mca_base_framework.c:216 14 0x000000000004f479 ompi_mpi_finalize() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/gcc-opt/ompi/../../ompi/runtime/ompi_mpi_finalize.c:363 15 0x000000000004ac29 ompi_finalize_f() /home/900/z30_apps/builds/_UYaaG8i/0/nci/gadi-apps/openmpi/4.0.2/source/openmpi-4.0.2/intel-opt/ompi/mpi/fortran/mpif-h/profile/pfinalize_f.c:71 16 0x0000000000418cb0 accessom2_mod_mp_accessom2_deinit_() /scratch/p93/raf599/cosima/gaditest/libaccessom2/libcouple/src/accessom2.F90:839 17 0x000000000040ec0a MAIN__.V() /scratch/p93/raf599/cosima/gaditest/libaccessom2/ocean_stub/src/ocean.F90:114 18 0x000000000040ce22 main() ???:0 I also found this in the thousands of messages. A warning in rcache.c and a failed assertion which matches the trace.
|
interesting. do you think that's a related problem or something else? |
The problem seems to be a missing remap weights file.
This file doesn't exist:
It should really give a more informative error message than that. I've tried a couple of other remap ping files, but they appear to be the wrong size. Anyone know where that file is, or how the namcouple should be altered to be consistent with the weights files that are there? |
Isn't the problem that the |
Here? |
would it be a good idea to use the latest set of inputs from here? Some of the weights files have been renamed though. |
@russfiedler the file named
I assume this is, as it says, a mismatch and that this remap file is incompatible with the |
Tried |
Lysdexia rules! Try |
Same problem:
Different numbers though ... progress? |
Looks like it's back to front? |
It does doesn't it. I removed all the (optional?) size stuff from |
I was looking only at the atmosphere -> ice fields, which do require a remapping file. |
I am getting the same error with the I got |
If you're using |
Well I don't know if I changed something or just got it wrong, but even @nichannah I am working here:
Can you take a look and see if you can see the issue? I was about to dive into a debugger, but thought if you could see the problem easily then it would be a more productive approach. |
…oblem due to old forcing.json file. #36
I have fixed some of the tests. There were a few problems but the main one was that they did not use the new forcing.json field which I introduced to support JRA55 v1p4. The FORCING_SCALING and JRA55_v1p4_IAF tests are still not working due to missing/wrong input files. I'll fix those. |
I have merged a branch into master that fixes all the tests. However I have not set things up to run on Jenkins yet so keeping this issue open until we do that. |
Awesome! I can do the Jenkins stuff if you don't have time, but am busy right now. Let me know if you do start working on it so we don't duplicate. |
... or alternatively, libaccessom2 could be made back-compatible with the older |
Non-ERA5 tests should now work with |
Do COSIMA/access-om2#182 for libaccessom2 tests
The text was updated successfully, but these errors were encountered: