Are you running Elasticsearch? Want to take your data and get the heck outta Dodge? Blaze provides everything you need in a neat, blazing fast package!
Linux / OSX |
---|
- Uses the Elasticsearch sliced scroll API to get your data hella fast.
- Written in modern C++ using libcurl and RapidJSON.
- Distributed as a single, tiny binary.
Blaze compared to other Elasticsearch dump tools. The index has ~3.5M rows and
is ~5GB in size. Each tool is timed with time
and measures the time to write
a simple JSON dump file.
Tool | Time |
---|---|
Blaze | 00m40s |
elasticdump | 04m38s |
Get the binary for your platform from the Releases page or compile it yourself.
If you use it often it might make sense to put it in your PATH
somewhere.
$ blaze --host=http://localhost:9200 --index=massive_1 > dump.ndjson
This will connect to Elasticsearch on the specified host and start downloading
the massive_1
index to stdout. Make sure to redirect this somewhere, such as
a JSON file.
Blaze will dump everything to stdout in a format compatible with the
Elasticsearch Bulk API, meaning you can use curl
to put the data back.
curl -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@dump.ndjson"
One issue when working with large datasets is that Elasticsearch has an upper
limit on the size of HTTP requests (2GB). The solution is to split the file
with something like parallel
. The split should be done on even line numbers
since each command is actually two lines in the file.
cat dump.ndjson | parallel --pipe -l 50000 curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/other_data/_bulk --data-binary "@-"
--host=<value>
- the host where Elasticsearch is running.--index=<value>
- the index to dump.--slices=<value>
- (optional) the number of slices to split the scroll. Should be set to the number of shards for the index (as seen on/_cat/indices
). Defaults to 5.--size=<value>
- (optional) the size of the response (i.e, length of thehits
array). Defaults to 5000.--dump-mappings
- specify this flag to dump the index mappings instead of the source.--dump-index-info
- specify this flag to dump the full index information (settings and mappings) instead of the source.
To use HTTP Basic authentication you need to pass the following options. Note that passing a password on the command line will put it in your terminal history, so please use with care.
--auth=basic
- enable HTTP Basic authentication.--basic-username=foo
- the username.--basic-password=bar
- the password.--insecure
- For HTTPS connections, specify this flag to skip server certificate validation.
Building Blaze is easy. It requires libcurl
.
$ git submodule update --init
$ make
docker build -t blaze .
docker run -it blaze blaze
Copyright © Viktor Elofsson and contributors.
Blaze is provided as-is under the MIT license. For more information see LICENSE.
- For libcurl, see https://curl.haxx.se/docs/copyright.html
- For RapidJSON, see https://github.com/Tencent/rapidjson/blob/master/license.txt