Skip to content
/ ndjson2csv Public

A small cli tool to transform large ndjson files (http://ndjson.org/) to CSV. Ideal for transforming MongoDB exports to CSV without knowing the CSV headers.

License

Notifications You must be signed in to change notification settings

uzh/ndjson2csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ndjson2csv

A small cli tool that prerocesses and transforms large ndjson (http://ndjson.org/) files without setting your hair on fire. It is especially useful to quickly process large JSON exports from MongoDB.

The CSV fields/headers are computed automatically by flattening the documents.

Usage

Download the binary from the release page

ndjson2csv --help

Compiling from source

Requirements: JDK 8+ and Leiningen

Run lein bin in project root, the binary is target/default/ndjson2csv.

Options

  -l, --lines NUMBER_LINES      0            Number of lines to read from ndjson, can be used for testing
  -i, --input FILE                           Input file containing ndjson.
  -o, --output FILE             results.csv  File to write csv in, will be created if it doesn't exist.
  -m, --merge-with DOCUMENT_ID               Preprocesses the ndjson by deep-merging the documents, using the provided id as unique identifier. Useful when multiple documents should make up a line in the CSV file.
  -s, --separator CHARACTER     .            The separating character for the CSV fields that are generated by deep-merging the documents.
  -p, --pre--processor FILE                  Clojure file, containing a function named `process` that takes a map document as input and returns a processed version. This function is applied to all documents right after loading and parsing it.
  -h, --help

Examples

Reading the first 1000 lines of input.json, merging by using the top-level key subjectId as id, preprocessing by calling the function process from the file processor.clj on each parsed document and writing to output.csv:

ndjson2csv -i input.json -l 1000 -p examples/processor.clj -m subjectId -o output.csv

License

Copyright © 2018 University of Zurich Department of Economics

About

A small cli tool to transform large ndjson files (http://ndjson.org/) to CSV. Ideal for transforming MongoDB exports to CSV without knowing the CSV headers.

Resources

License

Stars

Watchers

Forks

Packages

No packages published