-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a few questions about Cloudflare check timed out and Link extraction timed out #441
Comments
Please share same logs (redacting the hostname if needed) so that we can get a better grasp of what you're refering too |
am sorry for the late reply; I was crawling some stuff that forced me to wait. this is a log file for testing . I have noticed that when it gets to a book with many pages, around 2500-3000 and a bove, these messages show up, not happening with a little pages. here is another one : log-information.txt the same messages but fails. I have tried to crawl https://al-maktaba.org/book/31617 , also for testing but the crawler can't get the url as expected except If I use no workers at all. when I crawl with just 2 workers I get a message like Unable to get new page, browser likely crashed #400 but I have archived some successful crawls for the same domain with 4 workers and with the same messages above. |
Thanks. All this tend to show that browsertrix crawler is a bit instable in your scenarii. We will have to dig into it. |
have you found any solutions? do i have to download the new version of zimit . I've tried to know the reason but simply when I browse these big books online it (pages) loads a little bit slowly, so the crawler acts the the same and show up these messages. some websites when it has heavy content I get these messages too; with workers 4 . am trying to lower the workers with them, that's all. |
when I crawl some websites I have those two messages a long the crawl; is this something to worry about ? while "failed" is :0.
am crawling with 4 workers. and to let you know I have tested one has these issues but "resolve redirect" was ok, while the other displayed errors with some links (pages) not in the zimfile path. . If this is not normal how to deal with these issues?
is this related to the Resume failed browsertrix crawls enhancement.
The text was updated successfully, but these errors were encountered: