Skip to content

edwardsmit/csv_to_es

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Edward Smit
Jun 17, 2019
3af32f5 · Jun 17, 2019

History

3 Commits
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019
Jun 17, 2019

Repository files navigation

CsvToEs

This is a POC to parse a CSV in a streaming fashion and store a JSON-object derived from a CSV-line in Elasticsearch, using the Bulk-API of Elasticsearch.

Build

mix escript.build

Load a CSV

./csv_to_es <FILENAME>

Limitations

  • Currently only a ;-separated file is supported which must have a header-line for naming the ES-doc-fields. This project has only been tested with a bagadres-full.csv file downloaded from NLExtract.nl download
  • Elasticsearch is expected to run at localhost
  • As we don't create an _id field explicitly, multiple runs of the tool will create duplicates
  • The batch-size is fixed at 1_000 this figure has been made up with no test or knowledge whatsoever
  • The time-out of 60s has been chose as "large enough" to avoid timeouts
  • No error-handling is implemented
  • The target index is hardcoded to elixir-csv

Tip

Before running this tool you'd best set the number_of_replicas to 0 and the refresh_interval to -1 for the target-elasticsearch-index elixir-csv

Can I use this in Production

Probably not as is

About

POC for loading a CSV into Elasticsearch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages