You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However outline.com takes a few moments to clean it up and display the results. Is there anyway to halt the scrape until it finishes loading / bypassing paywalls, etc
Currently There is no way to add a delay into the html body fetch. I have hacked php-curl into feed iron in the past by adding it into The Function at Line 271. That said I'm not 100% sure you could get the desired result from curl.
The other idea I had been working on, but have put on hold for the moment I mentioned #38. Adding the ability to call phantomjs of selenium. But these are potentially complex and will require significant re-works of the code-base to integrate. I might re-visit them when I can break configs in version 2
Expected Behavior
I am rewriting certain URLs to goto outline.com/https://website.com
However outline.com takes a few moments to clean it up and display the results. Is there anyway to halt the scrape until it finishes loading / bypassing paywalls, etc
Current Behavior
Scrapes the loading page
Steps to Reproduce
URL - https://www.wsj.com/articles/the-nfls-best-players-are-getting-richer-than-ever-1536163544
{
"type": "xpath",
"xpath": [
"div[@Class='article-wrapper']"
],
"reformat": [
{
"type": "regex",
"pattern": "/.+.com/",
"replace": "https://outline.com/https://wsj.com"
}
]
}
The text was updated successfully, but these errors were encountered: