-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associate griddata values by centroids #34
Comments
ok. how should we deal with NULL values at this point, e.g. what if the point (39.75, 34.75) only has one non-null value 195 for "1980-01-01 00:00:00" and (39.25, 34.75) only has one non-value as well, say 190, for "1980-12-31 00:00:00". should i return:
or
|
In an aggregation null values should be ignored. So if you are averaging 35 grid cells in a given region for a given time and 5 of the values are null, the average should simply be calculated over the other 30 values ignore those 5. only if all 35 values are null should the resulting aggregated value be null. is that what you were asking? |
different statistical operations would require different minimum numbers of non-null values. for example if you are calculating a standard deviation over a set of data, i think you would require at least 3 non-null values. |
@joshuaelliott thanks for the clarification on the stats part. I think that @legendOfZelda question is more on if he would treat the nulls on the database, or pass this to the middleware. Anyway, it is a guidance for the stats calculations that we'll be performing. @legendOfZelda I think that passing the nulls should be interesting, specially to reflect that a certain timeframe has nulls and represent it accordingly. @njmattes what do you think? |
I'd pass the null values to the middleware as JavaScript's On the front end, |
got it @njmattes |
In the new API responses on the newly created server, the pixels again aren't 'collapsed' so that multiple timesteps are contained in a single pixel. Ie, a single pixel appears 11k+ times in the dataset, each time with a single value. Instead they should appear only once, with a 11k+ long array of values. In the aggregated responses there appear to be only single values rather than 11k+, one for each time step. Am I seeing that correctly? Also, a new problem to tackle is the size of the time series in the
|
Perhaps it's possible for the time series to store only the first time step, and the size of the delta? Then in the middleware we can unpack that into values for the front-end? Of course irregular time series, if we have any or want to support any, wouldn't work. |
you're right, i haven't done it yet (except for temporal aggregation, naturally)! sorry about that, fixing it today along with cleaning up the code, in particular adding try-except's. psims + agmerra have uniform time steps. however, gsde (soil data) doesn't have uniform depth steps. by nesting you just mean: |
i believe you can always assume uniform steps for time, yes. depth dimensions generally don't have more than ~10 values. so probably you can just record depth explicitly? and then only treat time as this special case. |
Sorry, I wasn't getting email notifications for this issue. Totally missed these comments. Yes, by 'nesting' I meant something like you've got above @legendOfZelda. I think depth and time can be treated the same way really, it's just that depth won't have multiple nested levels. Same for datasets like pAPSIM—with only 34 annual timesteps, it'd just be something like Speaking of depth v. time v. other dimensions, we're also going to need to add a dimension name to the response metadata. So users know what they're looking at and graphing (have to label axes somehow). |
@njmattes |
Sorry, but the response from the dev server is still a bit off. Now each point has its own dates and different lengths of values. The dates should be in the Perhaps details of the API are spread across too many threads at this point. I can start a fresh thread for an updated API discussion. Or maybe we need a better tool than github issues for sorting out the changes to the API? Postman is pretty rad, but its team-based features require subscriptions. |
@severin: A group by will result in the nulls being dropped unless your query account for one. As I see the query does not. @nate: Do you want the exact position of nulls or simply a count? Regarding API spec, yes we need a better way to handle these changes. Tanu On May 24, 2016, at 9:27 AM, Nathan Matteson notifications@github.com wrote:
|
@TanuMalik The |
yep, i'm rewriting the query. |
@njmattes done, you can test, takes longer now though, expect 2 mins, 20s roughly |
@njmattes can you please confirm the format if it is ok? On May 25, 2016, at 5:28 PM, Severin Thaler <notifications@github.commailto:notifications@github.com> wrote: @njmatteshttps://github.com/njmattes done, you can test, takes longer now though, expect 2 mins, 20s roughly — |
I can have a look later today—I'm in class until 5ish. |
Yep, this looks right. It runs too slow to actually hook up to Atlas though. The provisional Mongo backend is rounding the values to save space—that might help save time in the transfer. But not the execution itself I guess. Unrelated to this, I notice that when I request |
I understand that the reason it is slow may be because we haven’t created indices. I don’t understand why that hasn’t been done. (I imagine that the database we are working with here is more complex than the MySQL on my laptop, but I create indices as a matter of course.) Is that planned? On May 26, 2016, at 5:15 PM, Nathan Matteson <notifications@github.commailto:notifications@github.com> wrote: Yep, this looks right. It runs too slow to actually hook up to Atlas though. The provisional Mongo backend is rounding the values to save space—that might help save time in the transfer. But not the execution itself I guess. Unrelated to this, I notice that when I request /api/v0/griddata/dataset/1/var/1, I see "region": [[-180, -90], [180, -90], [180, 90], [-180, 90], [-180, -90]]. When I request from gridmeta I get a polygon with "coordinates": [[[-179.75, -89.75], [179.75, -89.75], [179.75, 89.75], [-179.75, 89.75], [-179.75, -89.75]]]. Is this expected? — |
i did create indexes, all reasonable ones, including on the postgis raster field |
im improving the query now + the python code. hope to cut down the RT considerably, say certainly below a minute and proceed from there |
Interesting thanks! On May 26, 2016, at 7:51 PM, Severin Thaler <notifications@github.commailto:notifications@github.com> wrote: i did create indexes, all reasonable ones, including on the postgis raster field — |
@legendOfZelda If you need help looking at the python, vectorizing operations, or anything like that, let me know. |
@legendOfZelda I was just checking the responses from the server and I notice it's down. I think just not listening on port 5000. Did you change the port—or is it just down for a bit? If there's a problem with the |
@njmattes was just experimenting with it, it's running again |
@njmattes if you meant the .57 machine, im cleaning it up there now + restart, will let you know when up again. |
@njmattes it's up and running on the .57 machine. let me know if there's any issue. |
Instead of returning the following for two points with two values each,
we should be returning single points with all values for that point in a single array of values like
The text was updated successfully, but these errors were encountered: