Associate griddata values by centroids #34

njmattes · 2016-04-25T19:38:27Z

Instead of returning the following for two points with two values each,

{ "data": [{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.75, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 195 ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.75, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 197 ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.25, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 191 ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.25, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 190 ]}}],
"metadata": {
  "dates": [
    "1980-01-01 00:00:00",
    "1980-12-31 00:00:00"
  ]}}

we should be returning single points with all values for that point in a single array of values like

{ "data": [{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.75, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 195, 197 ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.25, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 191, 190 ]}}],
"metadata": {
  "dates": [
    "1980-01-01 00:00:00",
    "1980-12-31 00:00:00"
  ]}}

The text was updated successfully, but these errors were encountered:

ghost · 2016-04-26T00:35:50Z

ok. how should we deal with NULL values at this point, e.g. what if the point (39.75, 34.75) only has one non-null value 195 for "1980-01-01 00:00:00" and (39.25, 34.75) only has one non-value as well, say 190, for "1980-12-31 00:00:00".

should i return:

{ "data": [{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.75, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ 195, NULL ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.25, 34.75 ] },
  "type": "Feature",
  "properties": {
    "values": [ NULL, 190 ]}}],
"metadata": {
  "dates": [
    "1980-01-01 00:00:00",
    "1980-12-31 00:00:00"
  ]}}

or

{ "data": [{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.75, 34.75 ] },
  "dates": ["1980-01-01 00:00:00"] },
  "type": "Feature",
  "properties": {
    "values": [ 195 ]}},
{ "geometry": {
  "type": "Point",
  "coordinates": [ 39.25, 34.75 ] },
  "dates": [ "1980-12-31 00:00:00"] },
  "type": "Feature",
  "properties": {
    "values": [ 190 ]}}],
}

joshuaelliott · 2016-04-26T00:46:37Z

In an aggregation null values should be ignored. So if you are averaging 35 grid cells in a given region for a given time and 5 of the values are null, the average should simply be calculated over the other 30 values ignore those 5. only if all 35 values are null should the resulting aggregated value be null. is that what you were asking?

joshuaelliott · 2016-04-26T00:49:32Z

different statistical operations would require different minimum numbers of non-null values. for example if you are calculating a standard deviation over a set of data, i think you would require at least 3 non-null values.

ricardobarroslourenco · 2016-04-26T00:52:48Z

@joshuaelliott thanks for the clarification on the stats part. I think that @legendOfZelda question is more on if he would treat the nulls on the database, or pass this to the middleware. Anyway, it is a guidance for the stats calculations that we'll be performing.

@legendOfZelda I think that passing the nulls should be interesting, specially to reflect that a certain timeframe has nulls and represent it accordingly. @njmattes what do you think?

njmattes · 2016-04-26T15:34:15Z

I'd pass the null values to the middleware as JavaScript's null object. But if a centroid contains null values for each time step, you can avoid passing the centroid entirely.

On the front end, nulls are discarded as far as graphing, calculating, and visualizing are concerned.

ghost · 2016-04-28T19:43:57Z

got it @njmattes

njmattes · 2016-05-17T14:33:14Z

In the new API responses on the newly created server, the pixels again aren't 'collapsed' so that multiple timesteps are contained in a single pixel. Ie, a single pixel appears 11k+ times in the dataset, each time with a single value. Instead they should appear only once, with a 11k+ long array of values.

In the aggregated responses there appear to be only single values rather than 11k+, one for each time step. Am I seeing that correctly?

Also, a new problem to tackle is the size of the time series in the response.metadata. For AgMerra (and other datasets no doubt), there is a lot of duplicate information (the same year hundreds of times for instance), which increases the size of the response to a point where it becomes infeasible. There are many solutions but these come to mind:

Nest the time series information rather than using timestamp strings to avoid repetition
Send a coarser resolution of data in the initial response and then perhaps using a websocket fill in the gaps over time (similar to have a progressive JPEG loads)

njmattes · 2016-05-17T15:02:35Z

Perhaps it's possible for the time series to store only the first time step, and the size of the delta? Then in the middleware we can unpack that into values for the front-end? Of course irregular time series, if we have any or want to support any, wouldn't work.

ghost · 2016-05-17T17:12:59Z

you're right, i haven't done it yet (except for temporal aggregation, naturally)! sorry about that, fixing it today along with cleaning up the code, in particular adding try-except's.

psims + agmerra have uniform time steps. however, gsde (soil data) doesn't have uniform depth steps.
so we could assume a uniform delta for time but not for depth. @joshuaelliott can we assume the time steps are uniform? or if they're not uniform that at least it's not many timesteps?

by nesting you just mean:
[(1980, [(1, [1, 2, ..., 31]), (2, [1, 2, ..., 30]), ..., (12, [1, 2, ..., 31])]), (1981, ...)]

joshuaelliott · 2016-05-17T17:16:43Z

i believe you can always assume uniform steps for time, yes.

depth dimensions generally don't have more than ~10 values. so probably you can just record depth explicitly? and then only treat time as this special case.

njmattes · 2016-05-20T11:41:51Z

Sorry, I wasn't getting email notifications for this issue. Totally missed these comments.

Yes, by 'nesting' I meant something like you've got above @legendOfZelda.

I think depth and time can be treated the same way really, it's just that depth won't have multiple nested levels. Same for datasets like pAPSIM—with only 34 annual timesteps, it'd just be something like [(1979, 1980, 1982, ... )] per Severin's example above.

Speaking of depth v. time v. other dimensions, we're also going to need to add a dimension name to the response metadata. So users know what they're looking at and graphing (have to label axes somehow).

ghost · 2016-05-24T03:20:37Z

@njmattes
wget http://[IP]:5000/api/v0/griddata/dataset/1/var/1
should now return the right format, with, for every (lat,lon), the values in an array

njmattes · 2016-05-24T14:27:01Z

Sorry, but the response from the dev server is still a bit off. Now each point has its own dates and different lengths of values. The dates should be in the metadata, and each point should have the same amount of values, with null values where necessary. Any point that contains all null values should be dropped.

Perhaps details of the API are spread across too many threads at this point. I can start a fresh thread for an updated API discussion. Or maybe we need a better tool than github issues for sorting out the changes to the API? Postman is pretty rad, but its team-based features require subscriptions.

TanuMalik · 2016-05-24T14:44:54Z

@severin: A group by will result in the nulls being dropped unless your query account for one. As I see the query does not.

@nate: Do you want the exact position of nulls or simply a count?

Regarding API spec, yes we need a better way to handle these changes.

Tanu

On May 24, 2016, at 9:27 AM, Nathan Matteson notifications@github.com wrote:

Sorry, but the response from the dev server is still a bit off. Now each point has its own dates and different lengths of values. The dates should be in the metadata, and each point should have the same amount of values, with null values where necessary. Any point that contains all null values should be dropped.

Perhaps details of the API are spread across too many threads at this point. I can start a fresh thread for an updated API discussion. Or maybe we need a better tool than github issues for sorting out the changes to the API? Postman is pretty rad, but its team-based features require subscriptions.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

njmattes · 2016-05-24T15:02:20Z

@TanuMalik The null values should be in the correct positions so that they align with the datetimes in the metadata section.

ghost · 2016-05-24T17:31:52Z

yep, i'm rewriting the query.
i'm adding a helper column to grid_dates that will help me locate where the non-NULL's are, expect to be done with it tonight.

ghost · 2016-05-25T22:28:34Z

@njmattes done, you can test, takes longer now though, expect 2 mins, 20s roughly

TanuMalik · 2016-05-26T15:10:04Z

@njmattes can you please confirm the format if it is ok?

On May 25, 2016, at 5:28 PM, Severin Thaler <notifications@github.com mailto:notifications@github.com> wrote:

@njmatteshttps://github.com/njmattes done, you can test, takes longer now though, expect 2 mins, 20s roughly

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//issues/34#issuecomment-221726939

njmattes · 2016-05-26T16:15:57Z

I can have a look later today—I'm in class until 5ish.

njmattes · 2016-05-26T22:15:35Z

Yep, this looks right. It runs too slow to actually hook up to Atlas though. The provisional Mongo backend is rounding the values to save space—that might help save time in the transfer. But not the execution itself I guess.

Unrelated to this, I notice that when I request /api/v0/griddata/dataset/1/var/1, I see "region": [[-180, -90], [180, -90], [180, 90], [-180, 90], [-180, -90]]. When I request from gridmeta I get a polygon with "coordinates": [[[-179.75, -89.75], [179.75, -89.75], [179.75, 89.75], [-179.75, 89.75], [-179.75, -89.75]]]. Is this expected?

ianfoster · 2016-05-26T23:52:17Z

I understand that the reason it is slow may be because we haven’t created indices. I don’t understand why that hasn’t been done. (I imagine that the database we are working with here is more complex than the MySQL on my laptop, but I create indices as a matter of course.) Is that planned?

On May 26, 2016, at 5:15 PM, Nathan Matteson <notifications@github.com mailto:notifications@github.com> wrote:

Yep, this looks right. It runs too slow to actually hook up to Atlas though. The provisional Mongo backend is rounding the values to save space—that might help save time in the transfer. But not the execution itself I guess.

Unrelated to this, I notice that when I request /api/v0/griddata/dataset/1/var/1, I see "region": [[-180, -90], [180, -90], [180, 90], [-180, 90], [-180, -90]]. When I request from gridmeta I get a polygon with "coordinates": [[[-179.75, -89.75], [179.75, -89.75], [179.75, 89.75], [-179.75, 89.75], [-179.75, -89.75]]]. Is this expected?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/34#issuecomment-222010773

ghost · 2016-05-27T00:51:53Z

i did create indexes, all reasonable ones, including on the postgis raster field

ghost · 2016-05-27T02:56:54Z

im improving the query now + the python code. hope to cut down the RT considerably, say certainly below a minute and proceed from there

ianfoster · 2016-05-27T03:08:58Z

Interesting thanks!

On May 26, 2016, at 7:51 PM, Severin Thaler <notifications@github.com mailto:notifications@github.com> wrote:

i did create indexes, all reasonable ones, including on the postgis raster field

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/34#issuecomment-222034186, or mute the threadhttps://github.com/notifications/unsubscribe/AC28rRKvvtLX5jewvIwIlo94J4nRQU3Fks5qFkAqgaJpZM4IPR_E.

njmattes · 2016-05-27T15:20:45Z

@legendOfZelda If you need help looking at the python, vectorizing operations, or anything like that, let me know.

njmattes · 2016-05-30T14:15:12Z

@legendOfZelda I was just checking the responses from the server and I notice it's down. I think just not listening on port 5000. Did you change the port—or is it just down for a bit? If there's a problem with the flask I'm happy to take a look.

ghost · 2016-05-31T16:51:37Z

@njmattes was just experimenting with it, it's running again

ghost · 2016-06-06T22:59:15Z

@njmattes if you meant the .57 machine, im cleaning it up there now + restart, will let you know when up again.

ghost · 2016-06-07T16:45:41Z

@njmattes it's up and running on the .57 machine. let me know if there's any issue.

njmattes assigned ghost Apr 25, 2016

njmattes changed the title ~~Associate griddata results by centroids~~ Associate griddata values by centroids Apr 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Associate griddata values by centroids #34

Associate griddata values by centroids #34

njmattes commented Apr 25, 2016

ghost commented Apr 26, 2016 •

edited by ghost

Loading

joshuaelliott commented Apr 26, 2016

joshuaelliott commented Apr 26, 2016

ricardobarroslourenco commented Apr 26, 2016

njmattes commented Apr 26, 2016 •

edited

Loading

ghost commented Apr 28, 2016

njmattes commented May 17, 2016

njmattes commented May 17, 2016

ghost commented May 17, 2016 •

edited by ghost

Loading

joshuaelliott commented May 17, 2016

njmattes commented May 20, 2016 •

edited

Loading

ghost commented May 24, 2016

njmattes commented May 24, 2016

TanuMalik commented May 24, 2016

njmattes commented May 24, 2016

ghost commented May 24, 2016 •

edited by ghost

Loading

ghost commented May 25, 2016

TanuMalik commented May 26, 2016

njmattes commented May 26, 2016

njmattes commented May 26, 2016

ianfoster commented May 26, 2016

ghost commented May 27, 2016

ghost commented May 27, 2016

ianfoster commented May 27, 2016

njmattes commented May 27, 2016

njmattes commented May 30, 2016

ghost commented May 31, 2016

ghost commented Jun 6, 2016

ghost commented Jun 7, 2016

Associate griddata values by centroids #34

Associate griddata values by centroids #34

Comments

njmattes commented Apr 25, 2016

ghost commented Apr 26, 2016 • edited by ghost Loading

joshuaelliott commented Apr 26, 2016

joshuaelliott commented Apr 26, 2016

ricardobarroslourenco commented Apr 26, 2016

njmattes commented Apr 26, 2016 • edited Loading

ghost commented Apr 28, 2016

njmattes commented May 17, 2016

njmattes commented May 17, 2016

ghost commented May 17, 2016 • edited by ghost Loading

joshuaelliott commented May 17, 2016

njmattes commented May 20, 2016 • edited Loading

ghost commented May 24, 2016

njmattes commented May 24, 2016

TanuMalik commented May 24, 2016

njmattes commented May 24, 2016

ghost commented May 24, 2016 • edited by ghost Loading

ghost commented May 25, 2016

TanuMalik commented May 26, 2016

njmattes commented May 26, 2016

njmattes commented May 26, 2016

ianfoster commented May 26, 2016

ghost commented May 27, 2016

ghost commented May 27, 2016

ianfoster commented May 27, 2016

njmattes commented May 27, 2016

njmattes commented May 30, 2016

ghost commented May 31, 2016

ghost commented Jun 6, 2016

ghost commented Jun 7, 2016

ghost commented Apr 26, 2016 •

edited by ghost

Loading

njmattes commented Apr 26, 2016 •

edited

Loading

ghost commented May 17, 2016 •

edited by ghost

Loading

njmattes commented May 20, 2016 •

edited

Loading

ghost commented May 24, 2016 •

edited by ghost

Loading