-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion of flexibility #6
Comments
Haven't made Jina local yet, but I have now made
checkout here: https://github.com/benhaotang/OpenDeepResearcher-via-searxng I would guess making Jina also local can be challenging due to so many websites have DDoS protection and rate limits, maybe need to set some cool down interval. |
Note, with playwright, reader-lm and docling, web parsing is also completely local, you can definitely have a try if you have a good enough rig. If Matt is interested I can definitely make a PR back |
Whats the difference between using Jina and a traditional scraper-as-a-service?I'm using a scraper called webunlocker in one of my projects, its pay as you go with 1.5 or 3$ per 1k successful scrapes. Then you can just do BS4 on it for free and you got the website content? I mean Jina got pretty low RPM limits (40rpm) and I dont know what the scraper-aas has but I cant imagine it would be as low as Jina and there are lots of identical scraper services like this Cant be bothered to calculate the price of Jina but as I can tell they charge per scraped tokens so probably it will cost more than just scraping traditionally even with a service? Antything I'm missing? |
I have one suggestion. If we can use bowseruse tool that is opensource library to make any decision based task. Let me know if that fits in or not. |
I think the reason for using multimodal compibitabilty, parsing not only text, rather including the photo OCR, pdf document analysis etc... |
I think it would be good to use LMStudio API for LLM request, so we can use local models.
SerpAPI migrate to Google Search API, so it will be limited to 100 search quires per day not for 100 / months on Serp.
If anyone have Jira alternatives or localhosted write here please.
I would do it myself and commit if find any time, but leave for now my ideas here. Great project anyway, thanks!
The text was updated successfully, but these errors were encountered: