Replies: 1 comment 1 reply
-
You could run the VLM pipeline which is using only a vision model (e.g. SmolDocling). There will soon be a new version of the layout model which might improve your use case. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
My understanding of the Docling architecture is that it begins by doing object detection. Testing on my confidential documents seems to indicate that the object detection model often fails for my documents. Is it possible to have Docling skip this step and instead treat everything in the image as a single object? Or are there alternative object detection models that I could try? My goal is to use Docling to process JPGs with OCR, yielding the text as markdown. So I really only want to use Docling to organize the OCR text into markdown. Maybe this is a bad idea, if each line in the markdown is supposed to correspond to an object, then skipping object detection means I'll just get one line of markdown.
Beta Was this translation helpful? Give feedback.
All reactions