Skip to content

Latest commit

 

History

History
394 lines (248 loc) · 19.5 KB

README.md

File metadata and controls

394 lines (248 loc) · 19.5 KB

d3-dsv

This module provides a parser and formatter for delimiter-separated values, most commonly comma- (CSV) or tab-separated values (TSV). These tabular formats are popular with spreadsheet programs such as Microsoft Excel, and are often more space-efficient than JSON. This implementation is based on RFC 4180.

Comma (CSV) and tab (TSV) delimiters are built-in. For example, to parse:

d3.csvParse("foo,bar\n1,2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]
d3.tsvParse("foo\tbar\n1\t2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]

Or to format:

d3.csvFormat([{foo: "1", bar: "2"}]); // "foo,bar\n1,2"
d3.tsvFormat([{foo: "1", bar: "2"}]); // "foo\tbar\n1\t2"

To use a different delimiter, such as “|” for pipe-separated values, use d3.dsvFormat:

var psv = d3.dsvFormat("|");

console.log(psv.parse("foo|bar\n1|2")); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]]

For easy loading of DSV files in a browser, see d3-request’s d3.csv and d3.tsv methods.

Installing

If you use NPM, npm install d3-dsv. Otherwise, download the latest release. You can also load directly from d3js.org, either as a standalone library or as part of D3 4.0. AMD, CommonJS, and vanilla environments are supported. In vanilla, a d3 global is exported:

<script src="https://d3js.org/d3-dsv.v1.min.js"></script>
<script>

var data = d3.csvParse(string);

</script>

Try d3-dsv in your browser.

API Reference

# d3.csvParse(string[, row]) <>

Equivalent to dsvFormat(",").parse.

# d3.csvParseRows(string[, row]) <>

Equivalent to dsvFormat(",").parseRows.

# d3.csvFormat(rows[, columns]) <>

Equivalent to dsvFormat(",").format.

# d3.csvFormatRows(rows) <>

Equivalent to dsvFormat(",").formatRows.

# d3.tsvParse(string[, row]) <>

Equivalent to dsvFormat("\t").parse.

# d3.tsvParseRows(string[, row]) <>

Equivalent to dsvFormat("\t").parseRows.

# d3.tsvFormat(rows[, columns]) <>

Equivalent to dsvFormat("\t").format.

# d3.tsvFormatRows(rows) <>

Equivalent to dsvFormat("\t").formatRows.

# d3.dsvFormat(delimiter) <>

Constructs a new DSV parser and formatter for the specified delimiter. The delimiter must be a single character (i.e., a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not.

# dsv.parse(string[, row]) <>

Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of objects representing the parsed rows.

Unlike dsv.parseRows, this method requires that the first line of the DSV content contains a delimiter-separated list of column names; these column names become the attributes on the returned objects. For example, consider the following CSV file:

Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

The resulting JavaScript array is:

[
  {"Year": "1997", "Make": "Ford", "Model": "E350", "Length": "2.34"},
  {"Year": "2000", "Make": "Mercury", "Model": "Cougar", "Length": "2.38"}
]

The returned array also exposes a columns property containing the column names in input order (in contrast to Object.keys, whose iteration order is arbitrary). For example:

data.columns; // ["Year", "Make", "Model", "Length"]

If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the + operator), but better is to specify a row conversion function.

If a row conversion function is specified, the specified function is invoked for each row, being passed an object representing the current row (d), the index (i) starting at zero for the first non-header row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be ommitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:

var data = d3.csvParse(string, function(d) {
  return {
    year: new Date(+d.Year, 0, 1), // lowercase and convert "Year" to Date
    make: d.Make, // lowercase
    model: d.Model, // lowercase
    length: +d.Length // lowercase and convert "Length" to number
  };
});

Note: using + rather than parseInt or parseFloat is typically faster, though more restrictive. For example, "30px" when coerced using + returns NaN, while parseInt and parseFloat return 30.

# dsv.parseRows(string[, row]) <>

Parses the specified string, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of arrays representing the parsed rows.

Unlike dsv.parse, this method treats the header line as a standard row, and should be used whenever DSV content does not contain a header. Each row is represented as an array rather than an object. Rows may have variable length. For example, consider the following CSV file, which notably lacks a header line:

1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

The resulting JavaScript array is:

[
  ["1997", "Ford", "E350", "2.34"],
  ["2000", "Mercury", "Cougar", "2.38"]
]

If a row conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the + operator), but better is to specify a row conversion function.

If a row conversion function is specified, the specified function is invoked for each row, being passed an array representing the current row (d), the index (i) starting at zero for the first row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be ommitted from the array returned by dsv.parse; otherwise, the returned value defines the corresponding row object. For example:

var data = d3.csvParseRows(string, function(d, i) {
  return {
    year: new Date(+d[0], 0, 1), // convert first colum column to Date
    make: d[1],
    model: d[2],
    length: +d[3] // convert fourth column to number
  };
});

In effect, row is similar to applying a map and filter operator to the returned rows.

# dsv.format(rows[, columns]) <>

Formats the specified array of object rows as delimiter-separated values, returning a string. This operation is the inverse of dsv.parse. Each row will be separated by a newline (\n), and each column within each row will be separated by the delimiter (such as a comma, ,). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

If columns is not specified, the list of column names that forms the header row is determined by the union of all properties on all objects in rows; the order of columns is nondeterministic. If columns is specified, it is an array of strings representing the column names. For example:

var string = d3.csvFormat(data, ["year", "make", "model", "length"]);

All fields on each row object will be coerced to strings. For more control over which and how fields are formatted, first map rows to an array of array of string, and then use dsv.formatRows.

# dsv.formatRows(rows) <>

Formats the specified array of array of string rows as delimiter-separated values, returning a string. This operation is the reverse of dsv.parseRows. Each row will be separated by a newline (\n), and each column within each row will be separated by the delimiter (such as a comma, ,). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes.

To convert an array of objects to an array of arrays while explicitly specifying the columns, use array.map. For example:

var string = d3.csvFormatRows(data.map(function(d, i) {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
}));

If you like, you can also array.concat this result with an array of column names to generate the first row:

var string = d3.csvFormatRows([[
    "year",
    "make",
    "model",
    "length"
  ]].concat(data.map(function(d, i) {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
})));

Content Security Policy

If a content security policy is in place, note that dsv.parse requires unsafe-eval in the script-src directive, due to the (safe) use of dynamic code generation for fast parsing. (See source.) Alternatively, use dsv.parseRows.

Byte-Order Marks

DSV files sometimes begin with a byte order mark (BOM); saving a spreadsheet in CSV UTF-8 format from Microsoft Excel, for example, will include a BOM. On the web this is not usually a problem because the UTF-8 decode algorithm specified in the Encoding standard removes the BOM. Node.js, on the other hand, does not remove the BOM when decoding UTF-8.

If the BOM is not removed, the first character of the text is a zero-width non-breaking space. So if a CSV file with a BOM is parsed by d3.csvParse, the first column’s name will begin with a zero-width non-breaking space. This can be hard to spot since this character is usually invisible when printed.

To remove the BOM before parsing, consider using strip-bom.

Command Line Reference

dsv2dsv

# dsv2dsv [options…] [file]

Converts the specified DSV input file to DSV (typically with a different delimiter or encoding). If file is not specified, defaults to reading from stdin. For example, to convert to CSV to TSV:

csv2tsv < example.csv > example.tsv

To convert windows-1252 CSV to utf-8 CSV:

dsv2dsv --input-encoding windows-1252 < latin1.csv > utf8.csv

# dsv2dsv -h
# dsv2dsv --help

Output usage information.

# dsv2dsv -V
# dsv2dsv --version

Output the version number.

# dsv2dsv -o file
# dsv2dsv --out file

Specify the output file name. Defaults to “-” for stdout.

# dsv2dsv -r delimiter
# dsv2dsv --input-delimiter delimiter

Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2dsv --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# dsv2dsv -w delimiter
# dsv2dsv --output-delimiter delimiter

Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2dsv --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# csv2tsv [options…] [file]

Equivalent to dsv2dsv, but the output delimiter defaults to the tab character (\t).

# tsv2csv [options…] [file]

Equivalent to dsv2dsv, but the input delimiter defaults to the tab character (\t).

dsv2json

# dsv2json [options…] [file]

Converts the specified DSV input file to JSON. If file is not specified, defaults to reading from stdin. For example, to convert to CSV to JSON:

csv2json < example.csv > example.json

Or to convert CSV to a newline-delimited JSON stream:

csv2json -n < example.csv > example.ndjson

# dsv2json -h
# dsv2json --help

Output usage information.

# dsv2json -V
# dsv2json --version

Output the version number.

# dsv2json -o file
# dsv2json --out file

Specify the output file name. Defaults to “-” for stdout.

# dsv2json -r delimiter
# dsv2json --input-delimiter delimiter

Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.)

# dsv2json --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# dsv2json -r encoding
# dsv2json --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# dsv2json -n
# dsv2json --newline-delimited

Output newline-delimited JSON instead of a single JSON array.

# csv2json [options…] [file]

Equivalent to dsv2json.

# tsv2json [options…] [file]

Equivalent to dsv2json, but the input delimiter defaults to the tab character (\t).

json2dsv

# json2dsv [options…] [file]

Converts the specified JSON input file to DSV. If file is not specified, defaults to reading from stdin. For example, to convert to JSON to CSV:

json2csv < example.json > example.csv

Or to convert a newline-delimited JSON stream to CSV:

json2csv -n < example.ndjson > example.csv

# json2dsv -h
# json2dsv --help

Output usage information.

# json2dsv -V
# json2dsv --version

Output the version number.

# json2dsv -o file
# json2dsv --out file

Specify the output file name. Defaults to “-” for stdout.

# json2dsv --input-encoding encoding

Specify the input character encoding. Defaults to “utf8”.

# json2dsv -w delimiter
# json2dsv --output-delimiter delimiter

Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.)

# json2dsv --output-encoding encoding

Specify the output character encoding. Defaults to “utf8”.

# json2dsv -n
# json2dsv --newline-delimited

Read newline-delimited JSON instead of a single JSON array.

# csv2json [options…] [file]

Equivalent to json2dsv.

# tsv2json [options…] [file]

Equivalent to json2dsv, but the output delimiter defaults to the tab character (\t).