Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geospatial queries of datasets #55

Open
mullenkamp opened this issue Jul 22, 2022 · 2 comments
Open

Geospatial queries of datasets #55

mullenkamp opened this issue Jul 22, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@mullenkamp
Copy link
Member

Similar to the get_stations method, the get_datasets method should have functionality to geospatially filter datasets.
Just like the get_stations method, there should be a way to get all datasets within/intersect with an input polygon or when a point is provided as input get all datasets that are within that point.

@mullenkamp mullenkamp added the enhancement New feature or request label Jul 22, 2022
@mullenkamp mullenkamp self-assigned this Jul 22, 2022
@aspe98
Copy link

aspe98 commented Apr 27, 2023

Hello mullenkamp,

I am interested in obtaining station information using the get_station method with an input point.
However, I am facing some issues with the geospatial query.
May you please assist?

The script I am attempting to use is:

from tethysts import Tethys

ts = Tethys()
datasets = ts.datasets

my_dataset = [d for d in datasets if (d['parameter'] == 'precipitation') and
                                     (d['product_code'] == 'Moana backbone model')]
                                     
dataset_id = my_dataset[0]["dataset_id"]

stations = ts.get_stations(dataset_id)

geometry = {'type': 'Point', 'coordinates':[171.53229,-42.88314] #Mt Philistine Ews coordinates according to cliflo.niwa.co.nz}

#Get the data
PrecipData = ts.get_results(dataset_id, geometry=geometry, squeeze_dims=True)

#Convert to a dataframe for ease of viewing
PecipDataframe = PrecipData.to_dataframe().reset_index()

#Data are provided in UTC, so add 12 hours to convert to NZST
PecipDataframe.time = PecipDataframe.time + pd.Timedelta(hours = 12)

#Export to a csv file
PecipDataframe.to_csv("Out.csv",columns=['time','precipitation'],header=["DateTime","Rainfall"],index=False,date_format="%Y-%m-%dT%H:%M:%S",float_format='%.1f')

However, for the line:

PrecipData = ts.get_results(dataset_id, geometry=geometry, squeeze_dims=True)

I receive the following:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 16
     11 stations = ts.get_stations(dataset_id)
     14 geometry = {'type': 'Point', 'coordinates':[171.53229,-42.88314]} #Mt Philistine Ews coordinates according to cliflo.niwa.co.nz
---> 16 PrecipData = ts.get_results(dataset_id, geometry=geometry, squeeze_dims=True)

File C:\Program Files\Anaconda3\envs\tethys_new\lib\site-packages\tethysts\main.py:516, in Tethys.get_results(self, dataset_id, station_ids, geometry, lat, lon, from_date, to_date, from_mod_date, to_mod_date, version_date, heights, bands, squeeze_dims, output_path, compression, threads)
    513     stn_dict = self._stations[dataset_id][vd]
    515     # Run the spatial query
--> 516     stn_ids = spatial_query(stn_dict, geometry, lat, lon, None)
    517 else:
    518     raise ValueError('A station_id, point geometry or a combination of lat and lon must be passed.')

File C:\Program Files\Anaconda3\envs\tethys_new\lib\site-packages\tethysts\utils.py:198, in spatial_query(stns, query_geometry, lat, lon, distance)
    196 geom_query = shape(query_geometry)
    197 if isinstance(geom_query, Point):
--> 198     stn_ids = [get_nearest_station(stns, geom_query)]
    199 elif isinstance(geom_query, Polygon):
    200     stn_ids = get_intersected_stations(stns, geom_query)

File C:\Program Files\Anaconda3\envs\tethys_new\lib\site-packages\tethysts\utils.py:152, in get_nearest_station(stns, geom_query)
    150 strtree = STRtree(geom1)
    151 res = strtree.nearest(geom_query)
--> 152 res_id = res.wkb_hex
    154 stn_id_dict = {shape(s['geometry']).wkb_hex: i for i, s in stns.items()}
    156 stn_id = stn_id_dict[res_id]

AttributeError: 'numpy.int64' object has no attribute 'wkb_hex'

I have also attempted to substitute the geometry attributes for that of station id 0c58fa18828c819f551900f2:

geometry = {'type': 'Polygon', 'coordinates': [[172.0, -43.0, 172.0, -42.6, 171.6, -42.6, 171.6, -43.0, 172.0, -43.0]]}

but receive the error:

ValueError: A station_id, point geometry or a combination of lat and lon must be passed.

I suspect I must define the station_id (which returns KeyError: '0c58fa18828c819f551900f2'), pass geometry as point (see above error) or state the lat/lon.

Might you have any insight as to how I might ameliorate this issue?

Please note I am relatively new to Python so please excuse my limited understanding in the nature of the errors or the package itself.

Any help would be kindly appreciated!

@mullenkamp
Copy link
Member Author

mullenkamp commented Apr 27, 2023

Hi again,

Normally you'd want to open a new "issue" as my original "issue" is not specifically related to your issue, but don't worry about it this time.
As I mentioned in your issue #82 , make sure you've updated the tethysts package to the newest version.
Otherwise, your code seems run fine other than a typo (I assume) in the line with "geometry" defined.
It should be:
geometry = {'type': 'Point', 'coordinates':[171.53229,-42.88314]}
Yours has the inline comments before the }, which will break things.

I got this when I printed PrecipData when I ran your code up to that line:

Dimensions:        (time: 262968)
Coordinates:
    height         float32 0.0
    lat            float64 -42.9
    lon            float64 171.5
  * time           (time) datetime64[ns] 1991-01-01 ... 2020-12-31T23:00:00
Data variables:
    precipitation  (time) float32 ...
Attributes:
    institution:     MetService
    license:         https://creativecommons.org/licenses/by-nc-sa/4.0/
    result_type:     grid
    source:          simulation
    system_version:  4
    title:           cumulative precipitation in mm of the atmosphere by a si...
    version_date:    2022-01-01T00:00:00

Also for future reference, you can pass a geometry (point or polygon) to the get_station method to filter the stations that you want, but passing a point geometry to the get_results if perfectly fine too (it's probably better for your use case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants