Skip to content

Validate domains before running crawl #345

@navarone-feekery

Description

@navarone-feekery

Currently we don't validate domains before crawling. This can only be done through the validate command.
It is this way because we previously had a UI that would validate domain inputs, so invalid domains would be impossible (theoretically) to configure. However, that is no longer the case, and users can try to crawl invalid domains, which causes all sorts of weird errors.

Quick example

  • https://elastic.co is invalid (it redirects)
  • https://www.elastic.co is the correct domain

A user doesn't necessarily understand that the first URL is invalid based on our current error logging.
If we validate the domain at crawl time, we can avoid this problem.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions