Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibGfx+LibPDF: Add support for palettized JPEG2000 files #25741

Merged
merged 6 commits into from
Mar 3, 2025

Conversation

nico
Copy link
Contributor

@nico nico commented Mar 2, 2025

This adds support for JPEG2000 files with a palette.

In PDFs, these work fairly differently from when they're standalone: In PDFs, the PDF engine loads the palette indices from the jpeg2000 data and then applies a palette that is store in the PDF data. This adds support for both modes.

(PDFs allow using the color space from the jpeg2000 file instead of using an explicit color space. PDF viewers are very inconsistent what they do when doing this with palettized JPEG2000 files. Acrobat refuses to show such images. PDFium and Preview.app then use the palette in the file. I haven't tested yet what happens when using an indexed PDF with a /DeviceGray color space – I'd expect that Acrobat might then show the indices as grayscale data. When setting the color space of the indexed jpeg2000 to /DeviceRGB, Preview.app and Acrobat don't show the image, but PDFium shows the converted data with the palette in the jpeg2000 data (! This seems kind of incorrect; I'll file a bug for this))

Also adds a quirk for a technically invalid construct that happens in one of my test PDFs.

@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Mar 2, 2025
@nico
Copy link
Contributor Author

nico commented Mar 2, 2025

Before:

After (I feel compelled to mention that the color jumps on this slide are very exaggerated):

Before:

After:

Before:

pal-190-4-before

After (subtle, look at shadow in lower left):

pal-190-4-after2

@nico
Copy link
Contributor Author

nico commented Mar 2, 2025

And here's the file with the quirk. Before:

After:

Before:

After:

Pages 10 and 11 are also white before and have an image after.

Preview.app cannot display the images in this file either (they're technically invalid per spec, but not "very invalid"). PDFium shows them fine.

@nico
Copy link
Contributor Author

nico commented Mar 2, 2025

From 24 distinct issues, in 116 files (11.6%), 874 files without issues (87.4%) to 22 distinct issues, in 115 files (11.5%), 875 files without issues (87.5%) in my 1000 file test set.

@nico nico force-pushed the pdf-jpeg2000-palette branch from 62644cf to 0ecfd72 Compare March 2, 2025 20:05
@nico
Copy link
Contributor Author

nico commented Mar 2, 2025

Screenshot 2025-03-02 at 3 36 01 PM from mozilla/pdf.js#12213 now also works correctly.

@nico nico force-pushed the pdf-jpeg2000-palette branch from 0ecfd72 to 1321626 Compare March 2, 2025 23:54
In all PDFs I've seen so far, palettized JPEG2000 files in a PDF
use a PDF /Indexed color space, and the PDF expects to get palette
index data as grayscale from the JPEG2000 decoder.

So add an option to request that behavior.

(This means that a palettized JPEG2000 file might use a different
palette when inside a PDF than when outside it!)

We still don't support decoding standalone palettized JPEG2000 files,
only palettized JPEG2000 files embedded in PDFs.
@nico nico force-pushed the pdf-jpeg2000-palette branch from 1321626 to 5b0a43b Compare March 3, 2025 00:00
return Error::from_string_literal("JPEG2000ImageDecoderPlugin: Only full opacity channel supported yet");
has_alpha = true;
}
} else {
Copy link
Contributor Author

@nico nico Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff doesn't make this very visible, but everything in the else is everything that was on the LHS, just indented one level. (And with the // If you make this more comment added.)

nico added 5 commits March 2, 2025 20:58
The approach is to make a distinction between components and channels
(mirroring spec language). If there's no palette, each component
directly maps to a channel. (This uses move semantics, so there's
next to no data copying happening.) If there's a palette, it uses
the component mapping box to map components to channels, likely using
the palette for some channels. So a single component of palette indices
is converted to N (with N usually 3, sometimes 4) full channels.

If a palette is present, this copying has some overhead,
but it's conceptually simple. Things that should just work:

* Palette indices of any bitness

* Palette values with bpp in [1..16] (existing support handles)

* Two-component images where one component is a palette index and
  the other is an alpha channel (don't know how to make those, though)

* Palettes for CMYK files (and for other color spaces)

Note that this codepath is only used for standalone palettized JPEG2000
flies, not for JPEG2000 files in PDFs.

(To create ref-indexed.png, I opened
openjpeg-lossless-indexed-u8-rgb-u8.jp2 in Photoshop and saved it as
PNG.)
No effective behavior change, since the invalid value was multiplied
with other values that are zero when this case is hit.

Serendipitously covered by indexed-small.jp2 that's added in a later
commit in this PR. This makes the UBSan bot happy with that test.
I created indexed-small.jp2 with this script:
https://github.com/nico/hack/blob/d4772bae/make_palettized_jpeg2000.py

I created jpeg2000-indexed-small.pdf with a local script that
creates a PDF file embedding a jpeg2000 file, and then manually changed
the indexed color space in the PDF to use 0x7f bytes for all bytes
in the palette that were 0xff. Since PDF renderers are supposed to
use the palette that's in the PDF file, not the one that's in the
JPEG2000 data, this means that the image shows up darker in the
PDF than when viewing the file standalone. (It'd also be possible
to change the palette colors to something completely different.
I just went with "make it slightly darker" since that seems less
confusing.)
@nico nico force-pushed the pdf-jpeg2000-palette branch from 5b0a43b to bc12537 Compare March 3, 2025 01:58
@nico nico merged commit 05b4337 into SerenityOS:master Mar 3, 2025
12 checks passed
@nico nico deleted the pdf-jpeg2000-palette branch March 3, 2025 11:33
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant