Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in detect "Reindexing only valid with uniquely valued Index objects" #71

Open
Eisbrenner opened this issue Feb 19, 2024 · 7 comments

Comments

@Eisbrenner
Copy link

Hi there,

I don't understand the error message down below, could someone here give me a hint what it is that I am doing wrong?

The code just takes 2-grid-points-longitude wide strips of the globe and calculates the climatology and MHWs (I intend to use wider strips, but for now, 2 is the smallest possible width. I run into some internal issue with multi-indexing with 1 grid-point width).

I like to run this for the whole globe, but with the speeds I am seeing I probably need to split it up. Ultimately, sending it to multiple nodes on a super computer to get the total time below a week. If I extrapolate, it looks like I can do ~45 grid points longitude in 12 hours on a single node.

The data is the OISSTv2 dataset, so nothing out of the ordinary.

versions

Python 3.10.13
dask                      2024.2.0           Parallel PyData with Task Scheduling
h5netcdf                  1.3.0              netCDF4 via h5py
netcdf4                   1.6.5              Provides an object-oriented python interface to the netCDF version 4 library
numpy                     1.26.4             Fundamental package for array computing in Python
pandas                    2.2.0              Powerful data structures for data analysis, time series, and statistics
scipy                     1.12.0             Fundamental algorithms for scientific computing in Python
xarray                    2024.1.1           N-D labeled arrays and datasets in Python
xmhw                      0.9.4.dev5 0ab526e 'Marine heatwave detection code using xarray'

code

ds_sst = xr.open_mfdataset(str(paths["sst"])).load()

step = 2
n_chunks = len(ds_sst.lon) // step
if (len(ds_sst.lon) % step) == 0:
    print(f"Number of chunks: {n_chunks}")
    slices = [slice(n, n + step) for n in range(0, len(ds_sst.lon), step)]
else:
    print("`step` does not fit into `len(lon)`")
    raise False
def elapsed_time(start_time):
    end_time = time.time()
    return end_time - start_time


target_dir = Path("_tmp_chunks")
target_dir.mkdir(exist_ok=True, parents=True)

logfile = target_dir / "logfile.log"
logging.basicConfig(filename=str(logfile), level=logging.INFO)

for i, slc in enumerate(slices):
    if i == 2:
        break

    start_time = time.time()
    logging.info(f"Iteration: {i+1} / {n_chunks}")

    sst_chunk = ds_sst.isel(lon=slc)["sst"]
    clim_chunk = threshold(sst_chunk)

    part_time = time.time()
    clim_chunk.to_netcdf(target_dir / f"clim_chunk_{i}.nc")
    logging.info(f"Clims took {elapsed_time(part_time)} seconds")

    part_time = time.time()
    mhw_chunk, intermediate_chunk = detect(
        temp=sst_chunk,
        th=clim_chunk["thresh"],
        se=clim_chunk["seas"],
        intermediate=True,
    )
    logging.info(f"MHWs took {elapsed_time(part_time)} seconds")

    mhw_chunk.to_netcdf(target_dir / f"mhw_chunk_{i}.nc")
    intermediate_chunk.to_netcdf(target_dir / f"intermediate_chunk_{i}.nc")

    logging.info(f"Iteration {i+1} took {elapsed_time(start_time)} seconds")

error

InvalidIndexError                         Traceback (most recent call last)
Cell In[14], line 30
     27 logging.info(f"Clims took {elapsed_time(part_time)} seconds")
     29 part_time = time.time()
---> 30 mhw_chunk, intermediate_chunk = detect(
     31     temp=sst_chunk,
     32     th=clim_chunk["thresh"],
     33     se=clim_chunk["seas"],
     34     intermediate=True,
     35 )
     36 logging.info(f"MHWs took {elapsed_time(part_time)} seconds")
     38 mhw_chunk.to_netcdf(target_dir / f"mhw_chunk_{i}.nc")

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/xmhw.py:454, in detect(***failed resolving arguments***)
    440     for c in ts.cell:
    441         mhwls.append(
    442             define_events(
    443                 ts.sel(cell=c),
   (...)
    452             )
    453         )
--> 454 results = dask.compute(mhwls)
    456 # Concatenate results and save as dataset
    457 # re-assign dimensions previously used to stack arrays
    458 if point:

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/dask/base.py:663, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    660     postcomputes.append(x.__dask_postcompute__())
    662 with shorten_traceback():
--> 663     results = schedule(dsk, keys, **kwargs)
    665 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/identify.py:368, in define_events(ts, th, se, idxarr, minDuration, joinGaps, maxGap, intermediate, tdim)
    332 """Finds all MHW events of duration >= minDuration and calculate
    333 their properties.
    334 
   (...)
    364     properties along time axis. If intermediate is False is None
    365 """
    367 # reindex thresh and seas along time index
--> 368 thresh = th.sel(doy=ts.doy)
    369 seas = se.sel(doy=ts.doy)
    371 # Find MHWs as exceedances above the threshold
    372 # Time series of "True" when threshold is exceeded

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xarray/core/dataarray.py:1615, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1499 def sel(
   1500     self,
   1501     indexers: Mapping[Any, Any] | None = None,
   (...)
   1505     **indexers_kwargs: Any,
   1506 ) -> Self:
   1507     """Return a new DataArray whose data is given by selecting index
   1508     labels along the specified dimension(s).
   1509 
   (...)
   1613     Dimensions without coordinates: points
   1614     """
-> 1615     ds = self._to_temp_dataset().sel(
   1616         indexers=indexers,
   1617         drop=drop,
   1618         method=method,
   1619         tolerance=tolerance,
   1620         **indexers_kwargs,
   1621     )
   1622     return self._from_temp_dataset(ds)

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xarray/core/dataset.py:3098, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   3030 """Returns a new dataset with each array indexed by tick labels
   3031 along the specified dimension(s).
   3032 
   (...)
   3095 
   3096 """
   3097 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3098 query_results = map_index_queries(
   3099     self, indexers=indexers, method=method, tolerance=tolerance
   3100 )
   3102 if drop:
   3103     no_scalar_variables = {}

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xarray/core/indexing.py:193, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    191         results.append(IndexSelResult(labels))
    192     else:
--> 193         results.append(index.sel(labels, **options))
    195 merged = merge_sel_results(results)
    197 # drop dimension coordinates found in dimension indexers
    198 # (also drop multi-index if any)
    199 # (.sel() already ensures alignment)

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xarray/core/indexes.py:782, in PandasIndex.sel(self, labels, method, tolerance)
    780     indexer = label_array
    781 else:
--> 782     indexer = get_indexer_nd(self.index, label_array, method, tolerance)
    783     if np.any(indexer < 0):
    784         raise KeyError(f"not all values found in index {coord_name!r}")

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xarray/core/indexes.py:561, in get_indexer_nd(index, labels, method, tolerance)
    559 if flat_labels.dtype == "float16":
    560     flat_labels = flat_labels.astype("float64")
--> 561 flat_indexer = index.get_indexer(flat_labels, method=method, tolerance=tolerance)
    562 indexer = flat_indexer.reshape(labels.shape)
    563 return indexer

File /proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/pandas/core/indexes/base.py:3882, in Index.get_indexer(self, target, method, limit, tolerance)
   3879 self._check_indexing_method(method, limit, tolerance)
   3881 if not self._index_as_unique:
-> 3882     raise InvalidIndexError(self._requires_unique_msg)
   3884 if len(target) == 0:
   3885     return np.array([], dtype=np.intp)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
@paolap
Copy link
Member

paolap commented Feb 20, 2024

This is weird, I tried to run the package tests (incidentally found a different issue which is fixed in the master) and I couldn't reproduce your error.
Then I tried using a different test and file and the first time I run it I could reproduce your error, however after that running the same test, same data, the code completes without errors. I did this a few times, including using the latest release so as to remove the latest change I did to fix the other unrelated issue. Again no errors the code still runs fine. So I'm not sure why this is happening I will keep on testing see if I can again reproduce your error again.

@paolap
Copy link
Member

paolap commented Feb 20, 2024

This is a shot in the dark but could you try to update xarray to 2024.2.0, it was only released a couple of days ago, we noticed a similar issue with reindex in 2024.1.1 (running the xarray tutorial) which has now disappeared. I think in between when I reproduced your error the first and only time and rerun the test the environment I was using got updated.

@Eisbrenner
Copy link
Author

Yes, I'll try that tomorrow!

@Eisbrenner
Copy link
Author

I checked now with updated versions, but I still get the error sometimes (more specifically in 59/360 slices).

Versions:

xarray                    2024.2.0           N-D labeled arrays and datasets in Python
xmhw                      0.9.4.dev6 87722bc 'Marine heatwave detection code using xarray'

@paolap
Copy link
Member

paolap commented Feb 21, 2024

Not sure what to say, as I couldn't get it anymore also I used exactly the same data when I briefly experienced it and in all the other attempts. I will try again, but unfortunately I don't have any work time allocated to work on this anymore

@Eisbrenner
Copy link
Author

I will see with those that failed if any of them fail consistently and report back in a few days. Maybe I find something reproducible.

@Eisbrenner
Copy link
Author

By iteratively re-running failed calculations, I was able to get all slices done except one. But that final slice throws a different error. So, what ever the issue is, there is no way for me to reproduce it reliably.

For me this is then resolved, tho I guess for documentation purposes you may want to keep this issue open?

And, regarding the remaining final error message, I am not sure if that error is related to the original error at all, but I'll share it anyway :)

The error occurs for longitude grid-points 1136 to 1140 (time: 1981-09-01 to 2023-02-16, lat: all).

error message

/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/identify.py:314: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  gaps_shifted = gaps_shifted.fillna(value=True)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/calculate_mhws.py", line 112, in <module>
    calculate_mhws()
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/identify.py:314: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  gaps_shifted = gaps_shifted.fillna(value=True)
    return ctx.invoke(self.callback, **ctx.params)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/calculate_mhws.py", line 108, in calculate_mhws
    DetectMHWs().update(n_chunks, n_iter, slc).calculate()
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/calculate_mhws.py", line 83, in calculate
    mhw_chunk, intermediate_chunk = detect(
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/xmhw.py", line 454, in detect
    results = dask.compute(mhwls)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/dask/base.py", line 663, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/identify.py", line 399, in define_events
    dfmhw = mhw_features(df, len(idxarr) - 1, tdim, dims)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/features.py", line 89, in mhw_features
    df = agg_df(dftime, tdim, dims)
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/xmhw/features.py", line 157, in agg_df
    dfout.loc[:,d] = val[0] 
  File "/proj/bolinc/users/x_ezrei/Notebooks/mhws/.venv/lib/python3.10/site-packages/pandas/core/indexing.py", line 1848, in _setitem_with_indexer
    raise ValueError(
ValueError: cannot set a frame with no defined index and a scalar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants