If you call bin/crawler validate <file> for a configuration like below, the response says it's a valid URL. That's because the domain is valid, however the seed URLs are invalid.
domains:
- url: https://example.com
seed_urls:
- https://example2.com
The above configuration will crawl, but only because the main domain is used as a fallback seed URL. There are no warnings or errors about this misconfiguration during the crawl.
Invalid seed URLs are simply discarded when building the initial seed URL array to begin the crawl. This discarding is silent (no logs), so as a user figuring out what is wrong with my seed URLs can be very confusing.
So potential improvements: