ECMWF OpenData / EPS Issues #116

csteele2 · 2022-11-02T03:40:01Z

csteele2
Nov 2, 2022

I was trying to use Herbie to easily download and process the european ensemble data. Not sure if I don't understand Herbie-fast or what, because Herbie fast appears to download the entire dataset, and seems like it takes way longer to do one timestep than my loop for 6 days. I have the sample of code I am using below. The fast herbie that is commented out took an hour for maybe one time step? Not sure, when it started another loop, I killed it, because my other loop takes 45 minutes, however, I have not been able to download a complete dataset in the three days I have been trying, for any cycle.

variable = "tp" #tp for precipitation
tp_all = []
valid_times = []
forecast_hours_qpf = range(3,147,3)
model_search_string = ":"+variable+":sfc:"
#forecast_hours_qpf = range(3,147,3)
#ptotal = fast_Herbie_xarray(DATES=model_run.strftime('%Y-%m-%d %H:00'), model="ecmwf", product="enfo", fxx=forecast_hours_qpf, max_threads=5, search_string=model_search_string)

for t in forecast_hours_qpf:
    H = Herbie(model_run.strftime('%Y-%m-%d %H:00'), model="ecmwf", product="enfo", fxx=t)
    tp = H.xarray(":"+variable+":sfc:")[0]
    #tp = tp.rename({"number":"pertubation"})
    tp_all.append(tp)
    valid_times.append(model_run + timedelta(hours=t))

ptotal = xr.concat([tp_all[i] for i in range(0,len(forecast_hours_qpf))], dim='step')

The most common problem is one or more timesteps will have the number (member/pertubation) coordinates as 0 instead of a an array of length 50. If I go back an assign those 50, it's clear something weird happened as revealed by this spot check of a single point (look at the 05-03Z column):

I have not yet attempted to just download those times separately, but I would think this has to be a problem with the processing vs data, right? This is probably way more data than with a typical use-case, but I like me my ensemble data.

Other than these issues, kudos on this though, it makes dealing with this big data so so so so so so so much easier, and really helps elevate some serious science game.

blaylockbk · 2022-11-30T05:00:19Z

blaylockbk
Nov 30, 2022
Maintainer

Sorry I'm getting to this late.

Yeah, the fast Herbie feature is an immature feature that needs more development and testing (thanks for testing it 😁).

Depending on the amount of data you are requesting, I've noticed its faster to just download the full file and subset them after.

I'll have to think about this issue. Here's some of my thought process...

# This is the new way to use FastHerbie
from herbie import FastHerbie

# Create multiple Herbie objects
FH = FastHerbie(
    DATES=["2022-11-29"],
    model="ecmwf",
    product="enfo",
    fxx=range(3, 12, 3),
)

# You can look at those objects with this
FH.objects

# Download those files (subsets)
# This took almost 2 minutes
FH.download(searchString=":tp:sfc:")

# Open the data with Xarray
ds = FH.xarray(searchString=":tp:sfc:")

But reading the data into xarray causes a error. I'll have to track down this bug for this case.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Untitled-1.ipynb Cell 6 in <cell line: 1>()
----> [1](vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W3sdW50aXRsZWQ%3D?line=0) ds = FH.xarray(searchString=":tp:sfc:")
      [2](vscode-notebook-cell:Untitled-1.ipynb?jupyter-notebook#W3sdW50aXRsZWQ%3D?line=1) ds

File c:\Users\blayl_depgywe\BB_python\Herbie\herbie\tools.py:280, in FastHerbie.xarray(self, searchString, max_threads, **xarray_kwargs)
    277     ds_list = [H.xarray(**xarray_kwargs) for H in self.file_exists]
    279 # Sort the DataSets, first by lead time (step), then by run time (time)
--> 280 ds_list.sort(key=lambda x: x.step.data.max())
    281 ds_list.sort(key=lambda x: x.time.data.max())
    283 # Reshape list with dimensions (len(DATES), len(fxx))

File c:\Users\blayl_depgywe\BB_python\Herbie\herbie\tools.py:280, in FastHerbie.xarray.<locals>.<lambda>(x)
    277     ds_list = [H.xarray(**xarray_kwargs) for H in self.file_exists]
    279 # Sort the DataSets, first by lead time (step), then by run time (time)
--> 280 ds_list.sort(key=lambda x: x.step.data.max())
    281 ds_list.sort(key=lambda x: x.time.data.max())
    283 # Reshape list with dimensions (len(DATES), len(fxx))

AttributeError: 'list' object has no attribute 'step'

2 replies

ljharris23 Apr 18, 2024

Did anyone happen to solve this bug? I am still finding the same issue

ljharris23 Apr 18, 2024

I think it's because the ECMWF ensemble data opens as a multiple hypercube where [0] contains all the ensemble members and [1] contains the control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECMWF OpenData / EPS Issues #116

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

ECMWF OpenData / EPS Issues #116

csteele2 Nov 2, 2022

Replies: 1 comment · 2 replies

blaylockbk Nov 30, 2022 Maintainer

ljharris23 Apr 18, 2024

ljharris23 Apr 18, 2024

csteele2
Nov 2, 2022

Replies: 1 comment 2 replies

blaylockbk
Nov 30, 2022
Maintainer