You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ER: images text presented in chunks
AR: no any image text in chunks provided
Rootcause of this issue is in docling_core.transforms.chunker.hierarchical_chunker.HierarchicalChunker.chunk:
for item, level in dl_doc.iterate_items():
By default iterate_items() do not traverse images and it should be enabled manually, but there is no way to enable images traversal from chunker to be passed to iterate_items() method.
Editing this line the following way:
for item, level in dl_doc.iterate_items(traverse_pictures=True):
solves the issue and images OCR text presented in chunks.
Please provide correct chunk options passed to iterate_items() or make images traversal enabled by default.
Bug
Hierachical Chunker does not provide any options to enable images traversal for document iterator
Steps to reproduce
ER: images text presented in chunks
AR: no any image text in chunks provided
Rootcause of this issue is in
docling_core.transforms.chunker.hierarchical_chunker.HierarchicalChunker.chunk
:By default
iterate_items()
do not traverse images and it should be enabled manually, but there is no way to enable images traversal from chunker to be passed toiterate_items()
method.Editing this line the following way:
solves the issue and images OCR text presented in chunks.
Please provide correct chunk options passed to
iterate_items()
or make images traversal enabled by default.Docling version
Python version
The text was updated successfully, but these errors were encountered: