Skip to content

Releases: webrecorder/browsertrix-crawler

Browsertix Crawler 0.8.1

25 Feb 02:34
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.8.0...0.8.1

Browsertrix Crawler 0.8.0

05 Feb 00:50
Compare
Choose a tag to compare

What's Changed

  • Switch to Chrome/Chromium 109 in #184
  • Convert to ESM module in #184
  • Add ad blocking via request interception (#173)
  • new setting: add support for specifying language via the --lang flag by @ikreymer in #186
  • Add screenshot functionality by @tw4l in #188
  • Remove dead pywb configuration by @edsu in #198
  • Use VNC for headful profile creation by @ikreymer in #197
  • arg parsing fix: by @ikreymer in #200
  • Improve crawler logging by @tw4l in #195
  • Add requests[socks] python dependency by @kuechensofa in #201
  • Add RedisCrawlState test by @tw4l in #208
  • crawl state: add getPendingList() to return pending state from either… by @ikreymer in #205
  • Serialize Redis pending pages as JSON objects by @tw4l in #212
  • behaviors: don't run behaviors in iframes that are about:blank or are… by @ikreymer in #211
  • Bump to Chrome 109, Beta 0.8.0-beta.1 Release by @ikreymer in #215
  • Fix --overwrite CLI flag by @tw4l in #220
  • deps: bump pywb to 2.7.3 by @ikreymer in #222
  • update behaviors to 0.4.1, rename 'Behavior line' -> 'Behavior log' by @ikreymer in #223

New Contributors

Full Changelog: 0.7.1...0.8.0

Browsertix Crawler 0.8.0 Beta 1

31 Jan 03:03
10e61d4
Compare
Choose a tag to compare
Pre-release

What's Changed

  • Improve crawler logging by @tw4l in #195
  • Add requests[socks] python dependency by @kuechensofa in #201
  • Add RedisCrawlState test by @tw4l in #208
  • crawl state: add getPendingList() to return pending state from either… by @ikreymer in #205
  • Serialize Redis pending pages as JSON objects by @tw4l in #212
  • behaviors: don't run behaviors in iframes that are about:blank or are… by @ikreymer in #211
  • Bump to Chrome 109, Beta 0.8.0-beta.1 Release by @ikreymer in #215

New Contributors

Full Changelog: 0.8.0-beta.0...0.8.0-beta.1

Browsertrix Crawler 0.8.0 Beta 0

13 Jan 04:03
2b03e23
Compare
Choose a tag to compare
Pre-release

Key Features

  • Switch to Chrome/Chromium 105
  • Convert to ESM module
  • Add ad blocking via request interception (#173)
  • Support for setting browser language (#186)
  • Screenshot functionality with different options: current view, full page, and thumbnail (#188)
  • Switch to VNC for interactive profile creation, which is now default, automated creation via --automated

What's Changed

Full Changelog: 0.7.1...0.8.0-beta.0

Browsertix Crawler 0.7.1

16 Nov 01:01
5b738bd
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.7.0...0.7.1

Browsertrix Crawler 0.7.0

12 Oct 01:07
Compare
Choose a tag to compare

What's Changed

New Contributors

  • @edsu made their first contribution in #171

Full Changelog: 0.6.0...0.7.0

Browsertix Crawler 0.7.0 Beta 5

21 Sep 01:30
65933c6
Compare
Choose a tag to compare
Pre-release

What's Changed

  • Interrupt Handling Fixes by @ikreymer in #167
  • Update to Browsertrix Behaviors 0.3.4 - Fix for lazy-loaded images #165

Full Changelog: 0.7.0-beta.4...0.7.0-beta.5

Browsertix Crawler 0.7.0 Beta 4

09 Sep 06:57
314ee3f
Compare
Choose a tag to compare
Pre-release

Fixing related to wait times, including:

  • netIdleWait better defaults: if not set, set to 15 seconds for page/page-spa scope, otherwise to 2 seconds
  • default behaviors: include autoscroll in default behavior as well
  • restart: if crawl already done, don't attempt to crawl further. if 'waitOnDone' set, wait for signal before exiting.
  • bump to puppeteer-core 17.1.2

Full Changelog: 0.7.0-beta.3...0.7.0-beta.4

Browsertrix Crawler 0.7.0 Beta 3

03 Sep 01:06
Compare
Choose a tag to compare
Pre-release

What's Changed

  • Overhaul of page concurrency system: better detection of windows that are stuck, only reuse same window for every 25 pages, #157
  • Logging improvements: pywb.log written with --logging pywb, JS errors logged with --logging jserrors #158
  • Avoid getting stuck on pending requests at end of crawl: #161
  • Update to Browsertrix Behaviors 0.3.3: Better Crawling of twitter and autoplay of videos
  • Update to pywb 2.6.8: Includes better rewriting of embedded twitter videos.

Full Changelog: 0.7.0-beta.2...0.7.0-beta.3

Browsertix Crawler 0.7.0 Beta 2

18 Aug 05:23
Compare
Choose a tag to compare
Pre-release

Fixes include:

  • Default --waitUntil set to load instead of load,networkidle2, due to occasional hanging waiting for both
  • Add --netIdleWait to specify wait for network idle after load (defaults to 10 seconds)
  • Update to puppeteer 16.1.0
  • Logging: if pywb logging is enabled, write logs to collection dir ./logs/pywb.log and ./logs/redis.log
  • Logging: reduce logging by not printing duplicate behavior status logs
  • pywb/openssl: allow 'unsafe legacy renegotiation' to avoid errors capturing sites that use older ssl

Full Changelog: 0.7.0-beta.1...0.7.0-beta.2