-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: NetCDF: Not a valid ID #80
Comments
I think we only have 1 thread per worker anyway with this? TopoPyScale/TopoPyScale/topo_scale.py Line 210 in e0a79e8
|
i understand 1 worker = 1 core? |
changed
to
and ran fine with no errors. I dont fully understand it so cant be confidently claimed to be a fix. WIll need to run a bunch more times to see if it really is a fix. |
This is on a branch "slurm" where am developing an embarrasingly paralilsable way of dealing with time dimension as current method only works if the script is run on a multicore machine NOT using a SLURM scheduler as on many HPC machines. This problem may be unique to that usecase (many workers accessing climate data netcdfs simultaneously. But I think @ArcticSnow mentioned seeing this issue and as discussion above shows - seems to happen with multi thread access to nc files. |
The multiprocessing library has both multithread and multicore. one core can handle multithreads. It is very convenient for instance to send and handle the download request (requiring little computation). Maybe in the config file we should separate and have one Also, notice that v0.2.2 does not parallelise in the time dimension. Parallelisation is only happening in space. Each time split are run sequentialy, when the previous one is done. |
of course - so actually this is a more general contribution - will write up the approach in discussions and link back here |
…s giving random errors as refernce here #80. If True is needed and works OK in other circumstances we should consider a config item
some more I think related info on this issue Basically seems safer to use parallel =False with mf_opendataset otherwise there is a chance of conflict between threads doing "stuff" on the nc file at the same time. There used to be a "lock" and "autoclose" args to the function but no longer. Maybe these are somehow implicitly in Parallel =False (this is also the default setting. |
This is a strange and somewhat random error - not always reproducible. From reading and as often case with strange random errors it may be related to multipe threads accessing same file at same time, here is a discussion.
https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389
Nice find! To summarise in this thread, it looks like a work-around in netcdf4-python to deal with netcdf-c not being thread safe was removed in 1.6.1. The solution (for now) is to [make sure your cluster only uses 1 thread per worker](https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389/14).
The text was updated successfully, but these errors were encountered: