RuntimeError: NetCDF: Not a valid ID #80

joelfiddes · 2023-05-17T08:11:59Z

This is a strange and somewhat random error - not always reproducible. From reading and as often case with strange random errors it may be related to multipe threads accessing same file at same time, here is a discussion.

https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389

Nice find! To summarise in this thread, it looks like a work-around in netcdf4-python to deal with netcdf-c not being thread safe was removed in 1.6.1. The solution (for now) is to [make sure your cluster only uses 1 thread per worker](https://forum.access-hive.org.au/t/netcdf-not-a-valid-id-errors/389/14).

The text was updated successfully, but these errors were encountered:

joelfiddes · 2023-05-17T08:55:07Z

I think we only have 1 thread per worker anyway with this?

TopoPyScale/TopoPyScale/topo_scale.py

Line 210 in e0a79e8

multithread_pooling(_subset_climate_dataset, fun_param, n_threads=n_core)

joelfiddes · 2023-05-17T08:56:09Z

i understand 1 worker = 1 core?

joelfiddes · 2023-05-17T12:13:15Z

changed

ds_ = xr.open_mfdataset(flist, parallel=True)

to

ds_ = xr.open_mfdataset(flist, parallel=False)

and ran fine with no errors. I dont fully understand it so cant be confidently claimed to be a fix. WIll need to run a bunch more times to see if it really is a fix.

joelfiddes · 2023-05-17T12:17:03Z

This is on a branch "slurm" where am developing an embarrasingly paralilsable way of dealing with time dimension as current method only works if the script is run on a multicore machine NOT using a SLURM scheduler as on many HPC machines. This problem may be unique to that usecase (many workers accessing climate data netcdfs simultaneously. But I think @ArcticSnow mentioned seeing this issue and as discussion above shows - seems to happen with multi thread access to nc files.

ArcticSnow · 2023-05-22T09:07:08Z

The multiprocessing library has both multithread and multicore. one core can handle multithreads. It is very convenient for instance to send and handle the download request (requiring little computation). Maybe in the config file we should separate and have one n_cores and n_threads to clarify a bit.

Also, notice that v0.2.2 does not parallelise in the time dimension. Parallelisation is only happening in space. Each time split are run sequentialy, when the previous one is done.

joelfiddes · 2023-05-25T07:18:38Z

of course - so actually this is a more general contribution - will write up the approach in discussions and link back here

joelfiddes · 2023-05-25T07:37:07Z

#83 (comment)

…s giving random errors as refernce here #80. If True is needed and works OK in other circumstances we should consider a config item

joelfiddes · 2023-06-02T09:22:52Z

some more I think related info on this issue

ecmwf/cfgrib#110

Basically seems safer to use parallel =False with mf_opendataset otherwise there is a chance of conflict between threads doing "stuff" on the nc file at the same time. There used to be a "lock" and "autoclose" args to the function but no longer. Maybe these are somehow implicitly in Parallel =False (this is also the default setting.

joelfiddes added a commit that referenced this issue Jun 1, 2023

changed default behavious iof opening ncdf to False from True. True i…

e86a751

…s giving random errors as refernce here #80. If True is needed and works OK in other circumstances we should consider a config item

joelfiddes added a commit that referenced this issue Jun 2, 2023

set mf_opendatset parallel arg to false, see here for discussion #80

5c09981

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: NetCDF: Not a valid ID #80

RuntimeError: NetCDF: Not a valid ID #80

joelfiddes commented May 17, 2023

joelfiddes commented May 17, 2023 •

edited

Loading

joelfiddes commented May 17, 2023

joelfiddes commented May 17, 2023 •

edited

Loading

joelfiddes commented May 17, 2023

ArcticSnow commented May 22, 2023

joelfiddes commented May 25, 2023

joelfiddes commented May 25, 2023

joelfiddes commented Jun 2, 2023

RuntimeError: NetCDF: Not a valid ID #80

RuntimeError: NetCDF: Not a valid ID #80

Comments

joelfiddes commented May 17, 2023

joelfiddes commented May 17, 2023 • edited Loading

joelfiddes commented May 17, 2023

joelfiddes commented May 17, 2023 • edited Loading

joelfiddes commented May 17, 2023

ArcticSnow commented May 22, 2023

joelfiddes commented May 25, 2023

joelfiddes commented May 25, 2023

joelfiddes commented Jun 2, 2023

joelfiddes commented May 17, 2023 •

edited

Loading

joelfiddes commented May 17, 2023 •

edited

Loading