Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -c compression option #11

Open
ElectricRCAircraftGuy opened this issue Dec 24, 2020 · 3 comments
Open

Add -c compression option #11

ElectricRCAircraftGuy opened this issue Dec 24, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@ElectricRCAircraftGuy
Copy link
Owner

ElectricRCAircraftGuy commented Dec 24, 2020

pdf2searchablepdf -c file.pdf

Shall produce file_searchable-comp1.pdf.

Make it a wrapper around this: https://askubuntu.com/a/243753/327339.

See if you can specify multiple resolutions or do multiple passes for further compression.

Allow -c1 (same as -c), -c2 for more compression, and -c3 for most compression.

Update my readme to explain how to manually do this compression after-the-fact too!

And update help menu with these new options.


pdf2searchablepdf -c1 file.pdf # low compression only
pdf2searchablepdf -c2 file.pdf # medium compression only
pdf2searchablepdf -c3 file.pdf # high compression only

Default is to output them all?

file_searchable_1.pdf # low compression 
file_searchable_2.pdf # medium compression 
file_searchable_3.pdf # high compression 

Use Ghostscript after-the-fact, to do compression only, on an already-processed PDF.
See my ans: https://askubuntu.com/questions/113544/how-can-i-reduce-the-file-size-of-a-scanned-pdf-file/1303196#1303196

pdf2searchablepdf --compress-only=low    file_searchable_1.pdf
pdf2searchablepdf --compress-only=medium file_searchable_1.pdf
pdf2searchablepdf --compress-only=high   file_searchable_1.pdf
@ElectricRCAircraftGuy ElectricRCAircraftGuy added the enhancement New feature or request label Dec 24, 2020
@ElectricRCAircraftGuy
Copy link
Owner Author

ElectricRCAircraftGuy commented Dec 28, 2020

nah...use small, medium, large instead of low, medium, high.

maybe --size=small, etc.

@ElectricRCAircraftGuy
Copy link
Owner Author

@ElectricRCAircraftGuy
Copy link
Owner Author

ElectricRCAircraftGuy commented Oct 13, 2022

TODO: The commit below partially fulfills this ticket.

  • I still need to add --size=small, --size=medium, and --size=large options.

Also:

  • Post-processing the PDF is a crude way to do it, but it's better than nothing. A better way to do it in the future would be to do OCR on the high-quality images and output the data to an intermediate format, then compress the images as desired and overlay the output OCR data onto the custom-compressed images. That will have to be future work.

ElectricRCAircraftGuy added a commit that referenced this issue Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant