Skip to content

A dynamic "Getting Started" Configuration file #269

@strawgate

Description

@strawgate

Problem Description

If #268 merges we could add a new get-started.yml.example that we can reference in documentation for the getting started experience.

Proposed Solution

Introduce a configuration file --config getting-started.yml.example with Environment Variables embedded that:

  1. Act as an example for customers who want to use dynamic configuration for other purposes
  2. Allows configuring many common Crawler settings via Environment Variables env=value bin/crawler ...:
    a. URL to be crawled: CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html"
    b. Output to local file: OUTPUT_DIR="/test"
    c. Output to Elasticsearch: ES_HOST="https://192.168.0.1"

This would allow enable a bunch of new quick-start scenarios

Crawl example.com and print the results to the console

bin/crawler --config getting-started.yml.example

Crawl your companies website and print the results to the console

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to a local Directory

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  OUTPUT_DIR=./local/dir \
  bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to Elasticsearch

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  ES_HOST="https://localhost:9200" \
  bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to Elasticsearch with a custom Index Pipeline and custom index

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  ES_HOST="https://localhost:9200" \
  ES_INDEX="crawler-workplace-search" \
  ES_PIPELINE="my-ent-search-pipeline" \  
  bin/crawler --config getting-started.yml.example

Alternatives

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions