Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use repr for numpy scalars #67

Merged
merged 2 commits into from
Dec 9, 2024
Merged

Conversation

gillins
Copy link
Member

@gillins gillins commented Dec 8, 2024

Ends up with weird things like this under numpy 2.0:

STATISTICS_MEAN=np.float(7.8117557801917)

See https://numpy.org/devdocs/release/2.0.0-notes.html#representation-of-numpy-scalars-changed.

RIOS seems unaffected - perhaps it does not end up with numpy scalars when doing the statistics.

@neilflood
Copy link
Member

Dammit!

Thanks @gillins. I agree, that is the correct fix. I had to do the same change in RIOS some months ago (ubarsc/rios#90), but I forgot that the same thing appears in pyshepseg.

I regard it as a poor choice by the numpy people in the numpy-2 update. However, it is obviously widely supported, so we have to live with it.

You might fix the comment on this PR - the thing you are fixing is repr(), not eval().

Thanks!

@neilflood
Copy link
Member

Just reflecting a little further on how I fixed it in RIOS. There, I used things like repr(float(medianval)), and similar. I think I was trying to preserve, wherever possible, something close to the datatype of the value in question. I suspect that it is up to the driver how this is really handled, so not sure if it matters. What do you think?

@gillins gillins changed the title don't use eval for numpy scalars don't use repr for numpy scalars Dec 8, 2024
@gillins
Copy link
Member Author

gillins commented Dec 8, 2024

OK just checked RIOS, you have an advantage there as you know whether it is an int type or not. repr(float(medianval)) is ok, but what should we do for min etc? Hmm yes, up to the driver as to how this is stored but I feel I should preserve the type. I think I might just copy gdalFloatTypes from rios.calcstats (RIOS is still only an optional dependency) and use that to decide whether I cast to float or int for (min/max/median/mode etc). Sound ok?
Yes poor choice by the numpy people....

@neilflood
Copy link
Member

Actually, just checked further, as more recollections made their way into consciousness. I think after I made the changes I referenced in that RIOS PR, I also then changed things further. The current code in RIOS now uses the GDAL function specific to setting the main stats values band.SetStatistics() (see calcstats.writeBasicStats()). I don't think this existed when we first set all that up.

I now remember tossing this up for ages, because it takes only float values, and only writes the four main statistics. Again, it is up to the driver what it does, but GDAL's own method seems to encourage just passing the float values, and assuming the driver does the right thing.

Writing the histogram (and mode/median) is done separately, in calcstats.writeHistogram(), but retains the contortions to preserve types.

I think it all became more complicated when I did the single-pass stuff, and wanted to capture as much common code between single-pass and non-single-pass cases, so I recall re-writing parts of that code for that.

I dunno, the whole thing seems a bit messy, and full of compromise.

How about we do much the same as RIOS (as simply as possible). And, as you say, don't assume RIOS itself is present for this, but copy over anything relevant.

Sigh...

@gillins
Copy link
Member Author

gillins commented Dec 9, 2024

Thanks Neil, how does this look now?

Copy link
Member

@neilflood neilflood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gillins. I think that looks good. Still pretty simple, but looks like to does all it needs to do.

@gillins gillins merged commit 6cead50 into ubarsc:master Dec 9, 2024
2 checks passed
@gillins gillins deleted the repr_numpy branch December 9, 2024 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants