-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Hi,
If my understanding is correct have noticed that when dealing with pdf the process is roughly this:
- each page is converted to image at 200dpi (the default of pdf2image)
- it gets then resized to max_img_size x max_img_size
Wouldn't this process possibly greatly reduce the quality even if max_img_size is set very big?
also the resize also alters the aspect ratio of the images. Is this done on purpose?
thanks
Here the relevant code bits:
Lines 51 to 52 in 668eee6
| img = Image.open(file_path) | |
| img = img.resize((max_img_size, max_img_size)) |
docext/docext/core/pdf2md/pdf2md.py
Lines 86 to 87 in 668eee6
| file_paths = convert_files_to_images(file_paths) | |
| resize_images(file_paths, max_img_size) |
| return convert_from_path(file_path) |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers