Skip to content
electblake edited this page Jun 4, 2013 · 13 revisions

Node.io can be used to interact with files, databases or streams. By default, node.io reads from stdin (elements are separated by \n or \r\n) and writes to stdout.

Example 1: Stdin / stdout

csv_to_tsv.coffee

nodeio = require 'node.io'
class CsvToTsv extends nodeio.JobClass
    run: (row) -> @emit row.replace /,/g, '\t'
    
@class = CsvToTsv
@job = new CsvToTsv()

Try it out

$ node.io csv_to_tsv.coffee < input.csv > output.tsv

Example 2: Files

Files can be read/written through stdin / stdout (see above), or specified inside the job.

csv_to_tsv.js

var nodeio = require('node.io');
exports.job = new nodeio.Job({
    input: 'input.csv',
    run: function (row) {
        this.emit(row.replace( /,/g, '\t'));
    },
    output: 'output.tsv'
});

The input or output files can be overridden at the command line

$ node.io -i new_input.csv -o new_output.tsv csv_to_tsv

Example 2b: Files containing separated values

node.io provides helper methods for interacting with separated values, such as CSV

run: function (row) {
    var values = this.parseValues(row);
    this.emit(values.join('\t'));
}

Example 3: Databases & custom IO

To read rows from a database, use the following template. start begins at 0 and num is the number of rows to return. When there are no more rows, return false.

database_template.js

var nodeio = require('node.io');
exports.job = new nodeio.Job({
    input: function (start, num, callback) {
        //
    },
    run: function (row) {
        this.emit(row);
    },
    output: function (rows) {
          //Note: this method always receives multiple rows as an array
          //
    },
});

Example 4: Streams

To read from read_stream and write to write_stream, use the following example

stream_template.js

var nodeio = require('node.io');
exports.job = new nodeio.Job({
    input: function () {
        this.inputStream(read_stream);
        this.input.apply(this, arguments);
    },
    run: function (line) {
        this.emit(line);
    },
    output: function (lines) {
          write_stream.write(lines.join('\n'));
    },
});

Example 5: Reading files in a directory

node.io can be used to walk through all files in a directory, and optionally recurse through subdirectories.

walk_template.js

var nodeio = require('node.io');
exports.job = new nodeio.Job({
    input: '/path/to/dir',
    run: function (full_path) {
        console.log(full_path);
        this.emit();
    }
});

recurse_template.js

var nodeio = require('node.io');
exports.job = new nodeio.Job({recurse: true}, {
    input: '/path/to/dir',
    run: function (full_path) {
        console.log(full_path);
        this.emit();
    }
});

The input path can be overridden at the command line

$ node.io -i "/new/path" recurse_template 

Example 6: Running a job once, or indefinitely

Node.io jobs are complete when all input has been consumed, however there may be a case where you want the job to run without input.

To run a job once without any input set input: false

To run a job indefinitely set input: true

Goto part 3: Scraping data from the web

Goto part 4: Data validation and sanitization

Clone this wiki locally