Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] - Create mapping for Elasticsearch index during crawl #19

Open
wambozi opened this issue Jan 17, 2020 · 0 comments
Open

[FEAT] - Create mapping for Elasticsearch index during crawl #19

wambozi opened this issue Jan 17, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@wambozi
Copy link
Owner

wambozi commented Jan 17, 2020

Is your feature request related to a problem? Please describe.
Currently, if an index specified in an Elasticsearch crawl doesn't exist, no mapping is applied to it when the crawl is initiated. The index is created using dynamic mapping in Elastic when the first document is indexed.

Describe the solution you'd like
A POST like:

curl -XPOST http://localhost:8081/crawl -d
{
  "index": "starwars-english",
  "url": "https://starwars.fandom.com",
  "type": "elasticsearch",
  "mapping": {
    "settings": {
      "analysis": {
        "filter": {
          "autocomplete_filter": {
            "type": "edge_ngram",
            "min_gram": 1,
            "max_gram": 20
          }
        },
        "analyzer": {
          "autocomplete": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "autocomplete_filter"
            ]
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "uri": {
          "type": "keyword"
        },
        "meta": {
          "properties": {
            "ogimage": {
              "type": "text"
            },
            "title": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "description": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "keywords": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            }
          }
        },
        "source": {
          "properties": {
            "h1": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "h2": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "h3": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "h4": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "p": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Will detect if the index in the request exists. If not, it will create the index with the mapping provided.

@wambozi wambozi added the enhancement New feature or request label Jan 17, 2020
@wambozi wambozi self-assigned this Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant