Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HybridChunker not available with just docling as dependency #994

Open
sanmai-NL opened this issue Feb 17, 2025 · 4 comments
Open

HybridChunker not available with just docling as dependency #994

sanmai-NL opened this issue Feb 17, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@sanmai-NL
Copy link
Contributor

Bug

Contrary to the docs, HybridChunker is not available with the docling dependency.

If you are using the docling package, you can import as follows:

from docling.chunking import HybridChunker

Steps to reproduce

Add docling as dependency.

    from docling.chunking import HybridChunker
.venv/lib/python3.12/site-packages/docling/chunking/__init__.py:12: in <module>
    from docling_core.transforms.chunker.hybrid_chunker import HybridChunker
.venv/lib/python3.12/site-packages/docling_core/transforms/chunker/hybrid_chunker.py:18: in <module>
    raise RuntimeError(
E   RuntimeError: Module requires 'chunking' extra; to install, run: `pip install 'docling-core[chunking]'`

Docling version

Docling version: 2.22.0
Docling Core version: 2.18.1
Docling IBM Models version: 3.3.2
Docling Parse version: 3.3.1
Python: cpython-312 (3.12.6)
Platform: macOS-15.3-arm64-arm-64bit

Python version

This section is unnecessary: see previous listing.

@sanmai-NL sanmai-NL added the bug Something isn't working label Feb 17, 2025
@dolfim-ibm
Copy link
Contributor

@sanmai-NL I'm not able to reproduce it. For me it works.

$ python -m venv venv
$ source ./venv/bin/activate
$ pip install docling
$ python
Python 3.12.7 (main, Oct  1 2024, 02:05:46) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from docling.chunking import HybridChunker
$ docling --version
Docling version: 2.22.0
Docling Core version: 2.18.1
Docling IBM Models version: 3.3.2
Docling Parse version: 3.3.1
Python: cpython-312 (3.12.7)
Platform: macOS-15.3.1-arm64-arm-64bit

@sanmai-NL
Copy link
Contributor Author

What package manager do you use? We use PDM.

@sanmai-NL
Copy link
Contributor Author

We don't use pip.

@Fogapod
Copy link

Fogapod commented Feb 18, 2025

Facing same issue with bazel/rules_python. I had to manually specify dependencies alongside docling:

        requirement("tokenizers"),
        requirement("semchunk"),

Works with pip though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants