Replies: 8 comments 6 replies
-
Hey @katie312! Just tested, and works just fine! The problem is, maybe a full image? Please look at this test, works just fine, and i think the problem is really the fact theat is an image, so please try with a pdf content. Also, try another documentloader, or just activate vision:
This will add each page as an image |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for your reply! Yes, I did input a PDF format image. Actually I edited the code of llm.py, because I have to use the api in LAN from the company(not deployed by ollama). I don't know if I edited it correctly. However, I changed the ocr function from DocumentLoaderPypdf() to tesseract, it worked successfully, but once I input a multi-page pdf document, the model response "input too long", should I try the splitting files example? Thanks for the advice. I would try the vision way. |
Beta Was this translation helpful? Give feedback.
-
You are welcome @katie312 ! So, for on-premise solutions do the following: Use a proper OCR and DocumentLoader. If you are using a non vision model, I would combine Docling or MarkitDown (docling is better) with Tesseract or EasyOCR. If is a vision model, use Docling without any configs, and just the with vision=true If your model have a context window of 8k or something small, just use:
You can read more about strategies here: https://enoch3712.github.io/ExtractThinker/core-concepts/completion-strategies/ |
Beta Was this translation helpful? Give feedback.
-
Thanks for the advice! My model is a non-vison model, but I would try using a vision model later on. I entered another question when I use
What is the ResponseModel in this case? I tried with:
but showed: Also, is there a code demo for combining Docling with Tesseract for the extractor? Thank you so much! |
Beta Was this translation helpful? Give feedback.
-
upadate: I tried using a vision model(GPT-4-Vision), but the output shows that I think maybe it's because the format of my llm api output and input are different from the project's? |
Beta Was this translation helpful? Give feedback.
-
https://docs.litellm.ai/docs/providers/openai yesm the right way is gpt-4o |
Beta Was this translation helpful? Give feedback.
-
Thank you so much! I will try using gpt-4o. Btw, any suggestions on CompletionStrategy.FORBIDDEN with AttributeError: 'dict' object has no attribute 'model_dump'? |
Beta Was this translation helpful? Give feedback.
-
@katie312 Right, your model have a small context window! So stategies are always set to default FORBIDDEN. You have 2, and you wanna use PAGINATE. Is going to look for all the content and then merge it |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I tried using DocumentLoaderPyPdf() as the pdf extractor, but saw nothing in the output json. So I tried to print the extracted content of the pdf, it also showed nothing. However, the program still get in the "content not empty" branch.
Beta Was this translation helpful? Give feedback.
All reactions