Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footnotes extraction for IEEE journals #1151

Open
q0oz opened this issue Mar 12, 2025 · 2 comments
Open

Footnotes extraction for IEEE journals #1151

q0oz opened this issue Mar 12, 2025 · 2 comments
Assignees
Labels
bug Something isn't working layout

Comments

@q0oz
Copy link

q0oz commented Mar 12, 2025

Requested feature

...
Hello,

It looks like Docling doesn't recognize footnotes properly for some IEEE manuscripts. For example,
https://arxiv.org/pdf/2503.08661
https://arxiv.org/pdf/2503.08027
https://arxiv.org/pdf/2503.08609

Currently, Docling recognizes those as the main text usually. Please see attached.

Such manuscript templates are quite popular within the IEEE organization and contain important meta-information about the authors.
Would it be possible to retrain/fine-tune the model so that Docling would recognize those as footnotes? Thank you!

Image

Image

Image

Alternatives

I tried to use a simple rule-based approach to check the font size, but unfortunately docling doesn't extract the font size. And it probably wouldn't be that reliable anyway.

@q0oz q0oz added the enhancement New feature or request label Mar 12, 2025
@PeterStaar-IBM PeterStaar-IBM added bug Something isn't working layout and removed enhancement New feature or request labels Mar 14, 2025
@PeterStaar-IBM
Copy link
Contributor

@q0oz Yes, we plan to retrain the layout segmentation. Thank you for providing this sample, it was not on our radar and we definitely need to fix this!

@q0oz
Copy link
Author

q0oz commented Mar 14, 2025

Thank you!

Just a quick note that you could find more samples, like the ones provided, here: https://arxiv.org/archive/eess in any category.

I also saw a lot of footnotes like that in other IEEE manuscripts which are not publicly available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working layout
Projects
None yet
Development

No branches or pull requests

3 participants