-
Notifications
You must be signed in to change notification settings - Fork 140
API Job Options
max (default: 1)
The maximum number of threads (calls to run()
) allowed to run concurrently, per process. When scraping, this can be used to limit the number of concurrent requests.
take (default: 1)
How many elements of input to send to each thread. If this is greater than 1, run()
will receive an array
Example when take: 2
input: [0,1,2,3,4],
run: function(input) {
console.log(input); //Outputs [0,1] \n [2,3] \n [4] \n
}
retries (default: 2)
The maximum number of times an element (or elements) of input can be retried using retry()
before the thread fails and fail()
is called
wait (default: undefined)
Specifies an amount of time in seconds to wait between threads. Useful if the API/server you are scraping defines a limit of how many requests you can make in a given amount of time. You can use this option along with the max
option to limit the number of concurrent requests. Note that this will also wait when you call skip()
.
auto_retry (default: false)
When this is set to true, failed requests or threads that throw an exception will automatically call this.retry()
timeout (default: false)
The maximum amount of time (in seconds) each thread can run for before fail()
is called
global_timeout (default: false)
The maximum amount of time (in seconds) the entire job has to complete before it exits with an error. This option can also be set from the command line using the -t
or --timeout
switch
flatten (default: true)
When calling emit()
with an array argument, this option determines whether the array is flattened before being output
Example when max: 3
run: function() {
this.emit([1,2,3]);
}
output: function(output) {
console.log(output);
//When flatten is true (default) this outputs [1,2,3,1,2,3,1,2,3]
//When flatten is false this outputs [ [1,2,3],[1,2,3],[1,2,3] ]
}
benchmark (default: false)
If this is true, node.io outputs benchmark information on a job's completion: 1) completion time, 2) bytes read + speed, 3) bytes written + speed. This can also be enabled from the command line using the -b
or --benchmark
switch
fork (default: false)
EDIT: Currently broken - fix coming soon.
Whether to use child processes to distribute processing. Set this to the number of desired workers. This can also be enabled from the command line using the -f
or --fork
switch. Run node.io --help
for details.
input (default: false)
This option is used to set a limit on how many lines / rows / elements are input before forcing a job to complete
Example when input: 100
and var i = 0;
input: function () {
return i++;
}
run: function(num) {
console.log(num); //Outputs the numbers 0 to 100
}
recurse (default: false)
If input
is a directory, this option is used to recurse through each subdirectory.
read_buffer (default: 8096)
The read buffer to use when reading files
newline (default: \n)
The char to use as newline when outputting data. Note that input newlines are automatically detected as \n
or \r\n
encoding (default: 'utf8')
The encoding to use when reading and writing data
jsdom (default: false)
Whether to use JSDOM to parse HTML (default is to use node-htmlparser). If JSDOM is used, jQuery is used as the default $
object
external_resources (default: false)
If you set jsdom
to true
and want to fetch and process external Javascript files, set external_resources
to ['script']
. Other values will not work.
proxy (default: false)
All requests will be made through this proxy. Alternatively, you can specify a function that returns a proxy (e.g. to cycle proxies).
redirects (default: 3)
The maximum number of redirects to follow before calling fail()
args (default: [])
This option is automatically filled with any extra arguments passed to the command line.
Example
$ node.io myjob arg1 arg2
=> this.options.args = ['arg1','arg2']