Detecting page breaks in markdown output #142
-
Is there any way to detect page breaks in markdown output of a PDF? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 11 replies
-
This is an interesting feature request. Let us put it in the pipeline after the upcoming release. |
Beta Was this translation helpful? Give feedback.
-
from itertools import accumulate |
Beta Was this translation helpful? Give feedback.
-
Is this implemented in the markdown exporte or a custom-made solution like above is still needed? |
Beta Was this translation helpful? Give feedback.
-
@dolfim-ibm , Please take a look at #762. |
Beta Was this translation helpful? Give feedback.
-
I refused to believe that this awesome library cannot properly detect page break tags in docx input files. I've spent endless hours to figure out a way but nothing works. This is an absolute necessity for so many use cases and in desperation I'm begging this great team to implement pagination as soon as possible! Ideally, a custom defined "page break"-tag should be inserted in all exporters, not just in markdown. Warmest regards to all! |
Beta Was this translation helpful? Give feedback.
-
Well to add page breaks in the markdown you can achieve it simply adding a argument of page_break_placeholder while exporting to markdown. All docling dependency versions docling==2.28.2
Documentation link :- https://docling-project.github.io/docling/reference/docling_document/#docling_core.types.doc.DoclingDocument.export_to_markdown |
Beta Was this translation helpful? Give feedback.
This is an interesting feature request. Let us put it in the pipeline after the upcoming release.
We are already using comments
<!-- -->
for tagging images. I think we could easily do the same to signal page breaks. And potentially, having the user specifying the preferred placeholder for it.