Add pause / wait to load before scrape #111

JReming85 · 2018-09-06T20:30:10Z

Expected Behavior

I am rewriting certain URLs to goto outline.com/https://website.com

However outline.com takes a few moments to clean it up and display the results. Is there anyway to halt the scrape until it finishes loading / bypassing paywalls, etc

Current Behavior

Scrapes the loading page

Steps to Reproduce

URL - https://www.wsj.com/articles/the-nfls-best-players-are-getting-richer-than-ever-1536163544

{
"type": "xpath",
"xpath": [
"div[@Class='article-wrapper']"
],
"reformat": [
{
"type": "regex",
"pattern": "/.+.com/",
"replace": "https://outline.com/https://wsj.com"
}
]
}

dugite-code · 2018-09-07T01:17:51Z

Currently There is no way to add a delay into the html body fetch. I have hacked php-curl into feed iron in the past by adding it into The Function at Line 271. That said I'm not 100% sure you could get the desired result from curl.

The other idea I had been working on, but have put on hold for the moment I mentioned #38. Adding the ability to call phantomjs of selenium. But these are potentially complex and will require significant re-works of the code-base to integrate. I might re-visit them when I can break configs in version 2

dugite-code added the enhancement label Sep 7, 2018

dugite-code closed this as completed Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pause / wait to load before scrape #111

Add pause / wait to load before scrape #111

JReming85 commented Sep 6, 2018 •

edited by dugite-code

Loading

dugite-code commented Sep 7, 2018

Add pause / wait to load before scrape #111

Add pause / wait to load before scrape #111

Comments

JReming85 commented Sep 6, 2018 • edited by dugite-code Loading

Expected Behavior

Current Behavior

Steps to Reproduce

dugite-code commented Sep 7, 2018

JReming85 commented Sep 6, 2018 •

edited by dugite-code

Loading