Validate command should output a warning if seed URL domains don't match main domain

If you call `bin/crawler validate <file>` for a configuration like below, the response says it's a valid URL. That's because the domain is valid, however the seed URLs are invalid.

```yaml
domains:
  - url: https://example.com
    seed_urls:
      - https://example2.com
```

The above configuration _will_ crawl, but only because the main domain is used as a fallback seed URL. There are no warnings or errors about this misconfiguration during the crawl.

Invalid seed URLs are simply discarded when building the initial seed URL array to begin the crawl. This discarding is silent (no logs), so as a user figuring out what is wrong with my seed URLs can be very confusing.

So potential improvements:

- [ ] `bin/crawler validate` should check seed URL validity
- [ ] Invalid seed URLs should be logged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate command should output a warning if seed URL domains don't match main domain #342

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate command should output a warning if seed URL domains don't match main domain #342

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions