-
Notifications
You must be signed in to change notification settings - Fork 4
/
notes.txt
19 lines (17 loc) · 1.67 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
How Does Streaming Work
In the above example, both the mapper and the reducer are executables that read the input from
stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an
appropriate cluster, and monitor the progress of the job until it completes.
When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the
mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the
process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each
line into a key/value pair, which is collected as the output of the mapper. By default, the prefix of a line up to the
first tab character is the key and the rest of the line (excluding the tab character) will be the value. If there is no
tab character in the line, then entire line is considered as key and the value is null. However, this can be customized,
as discussed later.
When an executable is specified for reducers, each reducer task will launch the executable as a separate process then
the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the
lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of
the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the
prefix of a line up to the first tab character is the key and the rest of the line (excluding the tab character) is the
value. However, this can be customized, as discussed later.