Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image convert #1110

Open
Zhengyu-Ju opened this issue Mar 4, 2025 · 1 comment
Open

image convert #1110

Zhengyu-Ju opened this issue Mar 4, 2025 · 1 comment
Labels
question Further information is requested

Comments

@Zhengyu-Ju
Copy link

Zhengyu-Ju commented Mar 4, 2025

Question

how can i convert a image using ImageFormatOption,i donnot know how to write pipeline_options use artifacts_path like pdf

the pipeline_options is

pipeline_options = PdfPipelineOptions()
            pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
            pipeline_options.generate_page_images = True
            pipeline_options.generate_picture_images = True
            pipeline_options.do_ocr = False
            pipeline_options.artifacts_path = doclingModelPath

convert is

self.doc_converter = DocumentConverter(
            allowed_formats=[
                InputFormat.PDF,
                InputFormat.IMAGE,
                InputFormat.DOCX,
                InputFormat.HTML,
                InputFormat.PPTX,
                InputFormat.ASCIIDOC,
                InputFormat.CSV,
                InputFormat.MD,
            ],
            format_options={
                InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
                InputFormat.IMAGE: ImageFormatOption(pipeline_options=pipeline_options),
        # InputFormat.DOCX: WordFormatOption(
        #     pipeline_cls=SimplePipeline
        # )
            }
        )

how can i write image pipeline_options using artifacts_path ?
...

@Zhengyu-Ju Zhengyu-Ju added the question Further information is requested label Mar 4, 2025
@dolfim-ibm
Copy link
Contributor

dolfim-ibm commented Mar 6, 2025

@Zhengyu-Ju can you please clarify what is not working? The image conversion is using the same steps as the PDF conversion, so I think your snippet should be correct.

In the case of images, I would anyway recommend using do_ocr = True, otherwise you might get nothing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants