Skip Docling Object Detection? Alternative Object Detection Models? #1518

buckeye17 · 2025-05-05T05:28:23Z

buckeye17
May 5, 2025

My understanding of the Docling architecture is that it begins by doing object detection. Testing on my confidential documents seems to indicate that the object detection model often fails for my documents. Is it possible to have Docling skip this step and instead treat everything in the image as a single object? Or are there alternative object detection models that I could try? My goal is to use Docling to process JPGs with OCR, yielding the text as markdown. So I really only want to use Docling to organize the OCR text into markdown. Maybe this is a bad idea, if each line in the markdown is supposed to correspond to an object, then skipping object detection means I'll just get one line of markdown.

dolfim-ibm · 2025-05-05T07:01:18Z

dolfim-ibm
May 5, 2025
Maintainer

You could run the VLM pipeline which is using only a vision model (e.g. SmolDocling). There will soon be a new version of the layout model which might improve your use case.

1 reply

buckeye17 May 5, 2025
Author

You could run the VLM pipeline which is using only a vision model (e.g. SmolDocling). There will soon be a new version of the layout model which might improve your use case.

I've been poking around trying to understand how the SmolDocling model interacts with Docling. I haven't been able to find a good explanation yet. Does it replace the object detection, table structure recognition and OCR components, feeding its resulting directly into the final assembly stage? I'm working on getting SmolDocling installed, but it is a challenge on Windows (flash attention 2 and triton are not easy on Windows).

EDIT:
In my original reply I suggested that Florence 2 would be a good OCR option to add to Docling. After further testing I've discovered that its performance degrades as the amount of text within an image increases. It is best suited for reading short pieces of text. So it probably isn't a great option for Docling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip Docling Object Detection? Alternative Object Detection Models? #1518

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Skip Docling Object Detection? Alternative Object Detection Models? #1518

Uh oh!

buckeye17 May 5, 2025

Replies: 1 comment · 1 reply

Uh oh!

dolfim-ibm May 5, 2025 Maintainer

Uh oh!

Uh oh!

buckeye17 May 5, 2025 Author

buckeye17
May 5, 2025

Replies: 1 comment 1 reply

dolfim-ibm
May 5, 2025
Maintainer

buckeye17 May 5, 2025
Author