From 2ebf751149a92c755059ca0ba596a6c0fcb6581d Mon Sep 17 00:00:00 2001 From: Micha Niskin Date: Sun, 21 Feb 2016 14:32:40 -0500 Subject: [PATCH] 1.1.0 -- json unescaping, csv output mode --- Makefile | 2 +- README.md | 4 ++-- jt.1 | 37 +++++++++++++++++++++++++++++++++++++ jt.1.html | 43 +++++++++++++++++++++++++++++++++++++++++++ jt.1.ronn | 40 ++++++++++++++++++++++++++++++++++++++++ jt.c | 4 ++-- 6 files changed, 125 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index 95f2f09..2617a06 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ .PHONY: all clean docs install dist -VERSION = 1.0.1 +VERSION = 1.1.0 CFLAGS = -O3 LDFLAGS = -static PREFIX = /usr/local diff --git a/README.md b/README.md index e646876..ef81e30 100644 --- a/README.md +++ b/README.md @@ -34,13 +34,13 @@ elb-2 i-b910a256 Linux users can install prebuilt binaries from the release tarball: ``` -sudo bash -c "cd /usr/local && wget -O - https://github.com/micha/json-table/releases/download/1.0.1/jt-1.0.1.tar.gz | tar xzvf -" +sudo bash -c "cd /usr/local && wget -O - https://github.com/micha/json-table/releases/download/1.1.0/jt-1.1.0.tar.gz | tar xzvf -" ``` Otherwise, to build from source: ``` -git checkout 1.0.1 && make && sudo make install +git checkout 1.1.0 && make && sudo make install ``` ## Documentation diff --git a/jt.1 b/jt.1 index aa2bcff..9be2569 100644 --- a/jt.1 +++ b/jt.1 @@ -115,6 +115,43 @@ If the item at the top of the data stack is not an object or if the object has n .IP If the \fIKEY\fR property of the object is an array subsequent commands will operate on one of the items in the array, chosen automatically by \fBjt\fR\. The array index will be available to subsequent commands via the index stack\. . +.SH "JSON UNESCAPING AND CSV OUTPUT" +Strings in JSON data must not contain control characters (e\.g\., \fBtab\fR, \fBnewline\fR, etc\.) These characters \fImust\fR be escaped with a backslash\. Additionally, any character \fImay\fR be escaped with a backslash\. The JSON specification also allows escaping of unicode characters with \fB\eu\fR escape, for example the copyright symbol © can be encoded as \fB\eu00A9\fR, and the G\-clef character 𝄞 as \fB\euD834\euDD1E\fR\. +. +.P +Numbers may be expressed in a number of ways in JSON data, and there is a single \fBNumber\fR type that encompasses both integer and floating point\. Both decimal and exponential notation are valid in JSON\. +. +.SS "Strings" +\fBJt\fR does not unescape string values by default, in case they contain tab or newline characters that would break the tabular output format\. If unescaped values are needed this can be achieved by invoking \fBjt\fR with the \fB\-u\fR option in post processing\. For example: +. +.IP "" 4 +. +.nf + +$ jt \-u \'i love music \eu266A\' +i love music ♪ +. +.fi +. +.IP "" 0 +. +.SS "Numbers" +\fBJt\fR does not process numbers in any way \(em they are printed in the output verbatim, as they appear in the JSON input\. If special processing is required the \fBprintf\fR program in coreutils is your friend: +. +.IP "" 4 +. +.nf + +$ printf %\.0f 2\.99792458e9 +2997924580 +. +.fi +. +.IP "" 0 +. +.SS "CSV Output" +The CSV format uses quoted values, which avoids the problems associated with values that contain tab and newline characters\. The \fB\-c\fR option puts \fBjt\fR into CSV output mode\. In this mode JSON strings are unescaped by default\. The \fBcsvtool\fR program and \fBcsvkit\fR suite of tools facilitate processing of CSV data in the shell\. +. .SH "EXAMPLES" We will use the following JSON input for the examples: . diff --git a/jt.1.html b/jt.1.html index 8cb2549..0ca2ab1 100644 --- a/jt.1.html +++ b/jt.1.html @@ -58,6 +58,7 @@ DESCRIPTION OPTIONS COMMANDS + JSON UNESCAPING AND CSV OUTPUT EXAMPLES COPYRIGHT SEE ALSO @@ -167,6 +168,48 @@

COMMANDS

+

JSON UNESCAPING AND CSV OUTPUT

+ +

Strings in JSON data must not contain control characters (e.g., tab, +newline, etc.) These characters must be escaped with a backslash. +Additionally, any character may be escaped with a backslash. The JSON +specification also allows escaping of unicode characters with \u escape, +for example the copyright symbol © can be encoded as \u00A9, and the G-clef +character 𝄞 as \uD834\uDD1E.

+ +

Numbers may be expressed in a number of ways in JSON data, and there is a +single Number type that encompasses both integer and floating point. Both +decimal and exponential notation are valid in JSON.

+ +

Strings

+ +

Jt does not unescape string values by default, in case they contain +tab or newline characters that would break the tabular output format. If +unescaped values are needed this can be achieved by invoking jt with the +-u option in post processing. For example:

+ +
$ jt -u 'i love music \u266A'
+i love music ♪
+
+ +

Numbers

+ +

Jt does not process numbers in any way — they are printed in the +output verbatim, as they appear in the JSON input. If special processing is +required the printf program in coreutils is your friend:

+ +
$ printf %.0f 2.99792458e9
+2997924580
+
+ +

CSV Output

+ +

The CSV format uses quoted values, which avoids the problems associated with +values that contain tab and newline characters. The -c option puts jt +into CSV output mode. In this mode JSON strings are unescaped by default. The +csvtool program and csvkit suite of tools facilitate processing of CSV +data in the shell.

+

EXAMPLES

We will use the following JSON input for the examples:

diff --git a/jt.1.ronn b/jt.1.ronn index 9216bbb..de074c7 100644 --- a/jt.1.ronn +++ b/jt.1.ronn @@ -115,6 +115,46 @@ The following commands are available: The array index will be available to subsequent commands via the index stack. +## JSON UNESCAPING AND CSV OUTPUT + +Strings in JSON data must not contain control characters (e.g., `tab`, +`newline`, etc.) These characters _must_ be escaped with a backslash. +Additionally, any character _may_ be escaped with a backslash. The JSON +specification also allows escaping of unicode characters with `\u` escape, +for example the copyright symbol © can be encoded as `\u00A9`, and the G-clef +character 𝄞 as `\uD834\uDD1E`. + +Numbers may be expressed in a number of ways in JSON data, and there is a +single `Number` type that encompasses both integer and floating point. Both +decimal and exponential notation are valid in JSON. + +### Strings + +**Jt** does not unescape string values by default, in case they contain +tab or newline characters that would break the tabular output format. If +unescaped values are needed this can be achieved by invoking **jt** with the +`-u` option in post processing. For example: + + $ jt -u 'i love music \u266A' + i love music ♪ + +### Numbers + +**Jt** does not process numbers in any way — they are printed in the +output verbatim, as they appear in the JSON input. If special processing is +required the `printf` program in coreutils is your friend: + + $ printf %.0f 2.99792458e9 + 2997924580 + +### CSV Output + +The CSV format uses quoted values, which avoids the problems associated with +values that contain tab and newline characters. The `-c` option puts **jt** +into CSV output mode. In this mode JSON strings are unescaped by default. The +`csvtool` program and `csvkit` suite of tools facilitate processing of CSV +data in the shell. + ## EXAMPLES We will use the following JSON input for the examples: diff --git a/jt.c b/jt.c index 5db2cf7..16b922d 100644 --- a/jt.c +++ b/jt.c @@ -1,7 +1,7 @@ #define _GNU_SOURCE #define JSMN_STRICT #define JSMN_PARENT_LINKS -#define JT_VERSION "1.0.1" +#define JT_VERSION "1.1.0" #include #include @@ -219,7 +219,7 @@ unsigned long utf_tag[4] = { 0x00, 0xc0, 0xe0, 0xf0 }; void encode_u_escaped(char **in, char **out) { unsigned long p = read_code_point(in); - int len = (p < 0x80) ? 1 : ((p < 0x800) ? 2 : ((p < 0x10000) ? 3 : 4)); + int len = (p < 0x80) ? 1 : (p < 0x800) ? 2 : (p < 0x10000) ? 3 : 4; *out += len; switch (len) { case 4: *--(*out) = ((p | 0x80) & 0xbf); p >>= 6;