Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2161: Updates libs with no user visible changes. #16

Merged
merged 12 commits into from
Jun 10, 2024

Conversation

flamingbear
Copy link
Member

@flamingbear flamingbear commented Jun 6, 2024

Description

This ticket was to bump the library versions and discover why regression tests
stopped passing.

There was a change in some image metadata when the pillow (PIL)
library was updated from v10.0.0 to v10.x.0.

The culprit is that GDAL adds a No Data value to the PNG metadata only when
the colormap has 256 values and a single one of those is fully transparent, e.g
RGBA(0,0,0,0).

pillow 10.0.0 always returned a 256 length colormap from the quantized image
irrespective of the number of colors requested.

Pillow 10.x.0 changed this behavior to only return a palette of the unique
values in the quantized image.

The regression tests failed because the shorter colortable passed to gdal did
not meet GDAL's (presumed, not documented) feature of adding a No Data value to
the output metadata.

By beginning with a 256-element all-black color table and writing extracted,
truncated one into it, we retain the previous behavior by having a full length
256 value color table, which allows for some images to have a NoData Value in
the PNG metadata.

Jira Issue ID

DAS-2161

Local Test Steps

Pull branch
Look that the pip_requirements.txt file has updated libraries
Run build and test the image.

❯ ./bin/build-image && ./bin/build-test && ./bin/run-test

Deploy image to Harmony-In-A-Box and run the HyBIG regression tests locally.
They should pass as they did before with no changes to the regression image or
tests.

No release. I think these changes can be picked up in the next release since there are no user facing changes.

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

GDAL adds a NoData value to the png metadata when the colormap has 256 values
and only one of those is transparent: RGBA(0,0,0,0).

pillow 10.0.0 always returned a 256 length colormap from the quantized image
irrespective of the number of colors requested.

Pillow 10.x.0 changed this behavior to only return a palete of the unique
values in the quantized image.

By beginning with a 256-element all-black colortable and writing the truncated
one over it, we retain the previous behavior which allows for some images to
have a NoData Value in the PNG metadata.
@flamingbear flamingbear marked this pull request as ready for review June 6, 2024 14:56
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code changes make sense. I have a couple of questions (that are probably are just easy things you'll tell me aren't issues).

The unit tests ran and passed locally, as did the regression test suite. Nice!

I'm probably in the opposite camp about not releasing. I'd have probably opted to do so, so we can test these changes in isolation as a patch release. I do appreciate that they shouldn't be significant, and that it leads to more deployment wrangling, but we are changing a bunch of underlying code with all those packages updates.

@@ -286,7 +287,7 @@ def get_color_map_from_image(image: Image) -> dict:

"""
color_tuples = np.array(image.getpalette(rawmode='RGBA')).reshape(-1, 4)
color_map = {}
color_map = all_black_color_map()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then. This is me gathering my thoughts (please tell me if anything below is wonky):

  • all_black_color_map returns a dictionary of 256 key/value pairs. All have an RGBA tuple that is black.
  • Below this line you then iterate through all the values in color_tuples and assign those values to their appropriate index in the dictionary (but it essentially starts at 0 and fills the first N indices).

So, the questions I have:

  • Will you ever get a time when color_tuples is significantly less than 256, and you have multiple black values at the end of the colour map? Does that matter?
  • Will you ever get a time when colour_tuples is more than 256? Would that be a problem? (It seems like the code will all still work, but maybe nodata would not be set as expected?)

Possibly nitpicky thought (feel free to ignore), you could use enumerate below maybe:

for idx, color_tuple in enumerate(color_tuples):
    color_map[idx] = tuple(color_tuple)

(And yes, I know I used "color", but only to be consistent 😉)

Copy link
Member Author

@flamingbear flamingbear Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you ever get a time when color_tuples is significantly less than 256, and you have multiple black values at the end of the colour map? Does that matter?

Yes, we have gotten that before and will still see that. It does not matter, because there are no values in the data that will have those indexes. For example run gdalinfo on the ASTGTMV003_N00E022_dem.png from the regression tests.

Will you ever get a time when colour_tuples is more than 256? Would that be a problem? (It seems like the code will all still work, but maybe nodata would not be set as expected?)

No, because we only ask for 254 values when quantizing .

I just tried with 300 for giggles and quantize won't go above 256: ValueError: bad number of colors

Also No Data is set right before reprojection, so it would end up overwriting the last color in the table. Which is also why I only ask for 254 values, so that we have a transparent and no data index available without ruining any of the quantized values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly nitpicky thought (feel free to ignore), you could use enumerate below maybe:

for idx, color_tuple in enumerate(color_tuples):
    color_map[idx] = tuple(color_tuple)

Sure that's cleaner.

@owenlittlejohns
Copy link
Member

owenlittlejohns commented Jun 6, 2024

Oh, I looked at the Snyk issues. They are kind of tricky - looks like the packages need a version of Python that Snyk isn't running. Snyk looks like it is using Python 3.9. The Python version used to be non-configurable in the way we are using Snyk, and I think that might still be the case. It's a total pain and is a reason why some projects get false positives.

@flamingbear
Copy link
Member Author

flamingbear commented Jun 6, 2024

Oh, I looked at the Snyk issues. They are kind of tricky - looks like the packages need a version of Python that Snyk isn't running.

I got totally side tracked and didn't look into it. Thanks for the tip.

@flamingbear
Copy link
Member Author

Oh, I looked at the Snyk issues. They are kind of tricky - looks like the packages need a version of Python that Snyk isn't running. Snyk looks like it is using Python 3.9. The Python version used to be non-configurable in the way we are using Snyk, and I think that might still be the case. It's a total pain and is a reason why some projects get false positives.

So This is the page I got to, but I can't even see as much information as you said following the gear symbol or the "about this error" link.

Screenshot 2024-06-07 at 9 10 44 AM

@owenlittlejohns
Copy link
Member

With the Snyk thing, there is a little bit of this being a prior bee in my bonnet. But I looked up the "Python version not supported" issue via the page with the error codes that is linked to from "About this error". That mentioned that it was likely the version of one of the packages needing a different version of Python.

To dig out the version of Python being used, when you click on the gear icon, move from the "Settings" tab to the "Overview" one. That then lists Python 3.9. But, to my knowledge, the version of Python is limited to being the same throughout all our Snyk projects. I had raised that the Python version being old was causing false positives with the CI/CD team for EED-3 before, and (if I remember rightly) Snyk got us on to a beta version of Snyk that allowed for us to configure the global version of Python being used (I think the CI/CD team bumped us from 3.7.2 to 3.9), but we still don't have a granular per-project control that ultimately we need.

TL;DR - I'm kind of unsure how to proceed here. I don't think we can get around the Snyk issues as they stand.

@flamingbear
Copy link
Member Author

flamingbear commented Jun 7, 2024

Python version not supported" issue via the page with the error codes that is linked to from "About this error".

I just did this again because I was sure I did this. But this time I saw it. I must have been typing it wrong (again and again).

But we can fix with a .synk file?
Alternatively, add a .snyk file for Python version selection override.

@flamingbear
Copy link
Member Author

@owenlittlejohns following discussions with Jon and Venku. I'm adding this snyk file so they can bump the default version to 3.9. But there's still some wonder as to why snyk is actually failing at 3.9 and 3.11 since 3.7 isn't used anywhere in the project

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still probably push to release this. But otherwise I think this is looking good. (Or as good as Snyk will let us make it look)

# Snyk (https://snyk.io) policy file, patches or ignores known vulnerabilities.
version: v1.25.0
language-settings:
python: "3.7"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - so just vocalising what I think we've got here: the .snyk file is preserving the version of Python that Snyk was previously using prior to the bump to Python 3.9.

This seems like the way to go for now, but it is a bit iffy that we are almost certainly getting different package versions being scanned by Snyk than we are actually installing in the Docker image. (Side note - it's nice that a .snyk file is actually working here, I don't think that was always the case)

@flamingbear flamingbear changed the title DAS-2180: Updates libs with no visible changes. DAS-2161: Updates libs with no user visible changes. Jun 10, 2024
@flamingbear flamingbear merged commit 73058ae into main Jun 10, 2024
4 checks passed
@flamingbear flamingbear deleted the mhs/DAS-2180/update-libraries branch June 10, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants