-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should flow parameterization work? #13
Comments
I think that using Prefect's Parameters is probably best. They make things a bit more complicated to debug outside of a I have an example at https://github.com/TomAugspurger/noaa-oisst-avhrr-feedstock/blob/5aa7b9007d1055c4e03306b87358ac916d559e59/recipe/pipeline.py. A few things to note:
# Flow parameters
days = Parameter(
"days", default=pd.date_range("1981-09-01", "1981-09-10", freq="D")
)
variables = Parameter("variables", default=["anom", "err", "ice", "sst"])
cache_location = Parameter(
"cache_location", default=f"gs://pangeo-forge-scratch/cache/{name}.zarr"
)
target_location = Parameter(
"target_location", default=f"gs://pangeo-forge-scratch/{name}.zarr"
)
@property
def sources(self):
source_url_pattern = (
"https://www.ncei.noaa.gov/data/"
"sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/"
"{yyyymm}/oisst-avhrr-v02r01.{yyyymmdd}.nc"
)
source_urls = [
source_url_pattern.format(yyyymm=day.strftime("%Y%m"), yyyymmdd=day.strftime("%Y%m%d"))
for day in self.days
]
return source_urls
@property
def flow(self):
....
nc_sources = [
download(x, cache_location=self.cache_location)
for x in self.sources # a regular python list
] Prefect
|
One potential downside of Prefect parameters, they must be JSON serializable (which I just ran into, since |
Now that we've started creating actual pangeo forge datasets, we're starting to notice the need for flow parameterization. I'll provide a few examples of where we may want to use a parameterized variable in our flows:
target_url
without needing to run the flow twice.Prefect supports parameterizing flows (https://docs.prefect.io/core/examples/parameterized_flow.html). The question is whether we want to use the prefect functionality or move toward a pangeo-forge api for this sort of thing.
The text was updated successfully, but these errors were encountered: