Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise FileNotFoundError("Missing ONNX file: {}".format(self._onnx_fn)) #1

Open
Analect opened this issue Dec 3, 2024 · 0 comments
Open

Comments

@Analect
Copy link

Analect commented Dec 3, 2024

@santoshborse . I was working my way through your PyData NYC tutorial, but I'm hitting a problem with cell 3.2 of dpk_intro_1_python.ipynb. I have tried installing dpk 0.2.1. which gives this error below. If I install dpk 0.2.2, then I get No module named 'docling.backend.docling_parse_v2_backend' trying to run the same cell. Any thoughts on how I might fix this?

16:05:12 INFO - pdf2parquet parameters are : {'artifacts_path': None, 'contents_type': <pdf2parquet_contents_types.JSON: 'application/json'>, 'do_table_structure': True, 'do_ocr': True, 'double_precision': 8}
INFO:pdf2parquet_transform:pdf2parquet parameters are : {'artifacts_path': None, 'contents_type': <pdf2parquet_contents_types.JSON: 'application/json'>, 'do_table_structure': True, 'do_ocr': True, 'double_precision': 8}
16:05:12 INFO - pipeline id pipeline_id
INFO:data_processing.runtime.execution_configuration:pipeline id pipeline_id
16:05:12 INFO - code location None
INFO:data_processing.runtime.execution_configuration:code location None
16:05:12 INFO - data factory data_ is using local data access: input_folder - input/solar-system output_folder - output/01_parquet_out
INFO:data_processing.data_access.data_access_factory_base07e751be-a466-43c5-8b1c-c20fe1535242:data factory data_ is using local data access: input_folder - input/solar-system output_folder - output/01_parquet_out
16:05:12 INFO - data factory data_ max_files -1, n_sample -1
INFO:data_processing.data_access.data_access_factory_base07e751be-a466-43c5-8b1c-c20fe1535242:data factory data_ max_files -1, n_sample -1
16:05:12 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.pdf'], files to checkpoint ['.parquet']
INFO:data_processing.data_access.data_access_factory_base07e751be-a466-43c5-8b1c-c20fe1535242:data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.pdf'], files to checkpoint ['.parquet']
16:05:12 INFO - orchestrator pdf2parquet started at 2024-12-03 16:05:12
INFO:data_processing.runtime.pure_python.transform_orchestrator:orchestrator pdf2parquet started at 2024-12-03 16:05:12
16:05:12 INFO - Number of files is 2, source profile {'max_file_size': 0.055823326110839844, 'min_file_size': 0.0551910400390625, 'total_file_size': 0.11101436614990234}
INFO:data_processing.runtime.pure_python.transform_orchestrator:Number of files is 2, source profile {'max_file_size': 0.055823326110839844, 'min_file_size': 0.0551910400390625, 'total_file_size': 0.11101436614990234}
16:05:12 INFO - Initializing models
INFO:pdf2parquet_transform:Initializing models
Fetching 9 files: 100%
 9/9 [00:00<00:00, 426.19it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/data_processing/runtime/pure_python/transform_orchestrator.py", line 84, in orchestrate
    _process_transforms(
  File "/usr/local/lib/python3.10/dist-packages/data_processing/runtime/pure_python/transform_orchestrator.py", line 153, in _process_transforms
    executor = PythonTransformFileProcessor(
  File "/usr/local/lib/python3.10/dist-packages/data_processing/runtime/pure_python/transform_file_processor.py", line 46, in __init__
    self.transform = transform_class(self.transform_params)
  File "/usr/local/lib/python3.10/dist-packages/pdf2parquet_transform.py", line 105, in __init__
    self._converter = DocumentConverter(
  File "/usr/local/lib/python3.10/dist-packages/docling/document_converter.py", line 54, in __init__
    self.model_pipeline = pipeline_cls(
  File "/usr/local/lib/python3.10/dist-packages/docling/pipeline/standard_model_pipeline.py", line 24, in __init__
    LayoutModel(
  File "/usr/local/lib/python3.10/dist-packages/docling/models/layout_model.py", line 46, in __init__
    self.layout_predictor = LayoutPredictor(
  File "/usr/local/lib/python3.10/dist-packages/docling_ibm_models/layoutmodel/layout_predictor.py", line 96, in __init__
    raise FileNotFoundError("Missing ONNX file: {}".format(self._onnx_fn))
FileNotFoundError: Missing ONNX file: /root/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/a8a57426c20d9f7bc0343cfd84e8b439425e5561/model_artifacts/layout/beehive_v0.0.5/model.pt
16:05:17 ERROR - Exception during execution Missing ONNX file: /root/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/a8a57426c20d9f7bc0343cfd84e8b439425e5561/model_artifacts/layout/beehive_v0.0.5/model.pt: None
ERROR:data_processing.runtime.pure_python.transform_orchestrator:Exception during execution Missing ONNX file: /root/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/a8a57426c20d9f7bc0343cfd84e8b439425e5561/model_artifacts/layout/beehive_v0.0.5/model.pt: None
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/data_processing/runtime/pure_python/transform_orchestrator.py", line 104, in orchestrate
    stats["processing_time"] = round(stats["processing_time"], 3)
KeyError: 'processing_time'
16:05:17 ERROR - Exception during execution 'processing_time': None
ERROR:data_processing.runtime.pure_python.transform_orchestrator:Exception during execution 'processing_time': None
16:05:17 INFO - Completed execution in 0.083 min, execution result 1
INFO:data_processing.runtime.pure_python.transform_launcher:Completed execution in 0.083 min, execution result 1
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<timed exec> in <module>

Exception: ❌ Job failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant