Book ePub are not always sufficiently compressed

In order to solve https://github.com/openzim/gutenberg/issues/288 and https://github.com/openzim/gutenberg/issues/235 and https://github.com/openzim/gutenberg/issues/222, we've decided to stop compressing (optimizing) ePubs on our own.

Recent runs and analysis done in https://github.com/openzim/gutenberg/issues/374 proved that optimization we were doing on ePubs was not that useless.

See for example book IDs 63630

| | 2023-08 | 2025-10 |
|--|--|--|
| Size | 265K | 520K |
| URL | https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/Der%20Einzige%20auf%20der%20weiten%20Welt:%20Ein%20Menschenleben.63630.epub | https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/Der%20Einzige%20auf%20der%20weiten%20Welt:%20Ein%20Menschenleben.63630.epub |

Or book ID 68838

| | 2023-08 | 2025-10 |
|--|--|--|
| Size | 438K | 4.6M |
| URL | https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/Der%20Graf%20von%20Saint-Germain:%20Das%20Leben%20eines%20Alchimisten.68838.epub | https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/Der%20Graf%20von%20Saint-Germain:%20Das%20Leben%20eines%20Alchimisten.68838.epub |

It is very important to note that many ePub of 2023-08 were missing all images (including the two examples above, due to https://github.com/openzim/gutenberg/issues/222) but it is not sufficient so far to explain all the file size increase.

I assume it would be safe to:
- first fix https://github.com/openzim/gutenberg/issues/375
- adapt to optimize ePub images (while we could theoritically reuse the same images, not sure it is feasible, at least use some compression settings)
- confirm expected size difference (should save about 3G on Gutenberg DE)
- if size difference is not there, check what else could be optimized in ePub
- if not yet done, decide how to handle optimization cache invalidation (see https://github.com/openzim/gutenberg/issues/288) 
- put optimization cache back in place

Remind that we've moved to ePub3 format, so the optimization logic is probably going to be different from what we used to have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Book ePub are not always sufficiently compressed #376

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	2023-08	2025-10
Size	265K	520K
URL	https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/Der%20Einzige%20auf%20der%20weiten%20Welt:%20Ein%20Menschenleben.63630.epub	https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/Der%20Einzige%20auf%20der%20weiten%20Welt:%20Ein%20Menschenleben.63630.epub

	2023-08	2025-10
Size	438K	4.6M
URL	https://dev.library.kiwix.org/content/gutenberg_de_all_2023-08/Der%20Graf%20von%20Saint-Germain:%20Das%20Leben%20eines%20Alchimisten.68838.epub	https://browse.library.kiwix.org/content/gutenberg_de_all_2025-10/Der%20Graf%20von%20Saint-Germain:%20Das%20Leben%20eines%20Alchimisten.68838.epub

Uh oh!

Book ePub are not always sufficiently compressed #376

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions