Skip to content

Image extraction from pdfs can reduce resolution and alter aspect ratio #62

@apbard

Description

@apbard

Hi,
If my understanding is correct have noticed that when dealing with pdf the process is roughly this:

  • each page is converted to image at 200dpi (the default of pdf2image)
  • it gets then resized to max_img_size x max_img_size

Wouldn't this process possibly greatly reduce the quality even if max_img_size is set very big?
also the resize also alters the aspect ratio of the images. Is this done on purpose?

thanks

Here the relevant code bits:

img = Image.open(file_path)
img = img.resize((max_img_size, max_img_size))

file_paths = convert_files_to_images(file_paths)
resize_images(file_paths, max_img_size)

return convert_from_path(file_path)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions