-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add tg.no crawling config (#2)
- Loading branch information
1 parent
7aeba7f
commit 29206a7
Showing
3 changed files
with
36 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Config intended to be used on new tg.no once launched. This page differs from | ||
# previous iterations (in practice, even if not in theory) by being a single | ||
# site gradually updated with new content and styling, rather than a new site | ||
# each year. | ||
seeds: | ||
# Crawl content available via navigation and frontpage | ||
- url: https://www.tg.no | ||
include: | ||
# Basic pages | ||
- www.tg.no | ||
|
||
# Block calls to our tracking service | ||
blockRules: | ||
- url: matomo.gathering.org | ||
|
||
collection: tgno | ||
|
||
behaviors: autoscroll,autoplay,autofetch,siteSpecific | ||
waitUntil: load,networkidle0 | ||
generateCDX: true | ||
combineWARCs: true | ||
saveState: always | ||
workers: 4 | ||
# TODO: Remove it not needed, hopefully we won't need consent flow on new site | ||
# Minimal profile that includes consent answers | ||
# profile: /crawls/profiles/tg24.tar.gz | ||
|
||
# Make "live" crawling view available at 9037 | ||
newContext: window | ||
screencastPort: 9037 | ||
|
||
warcinfo: | ||
operator: The Gathering | ||
hostname: www.tg.no |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters