This is an example program to calculate the average of a list of numbers using Mapreduce inside hadoop framework.
Program should run on a hadoop cluster and the configurations are set for hadoop 2.10 in the pom file. Can modify that to relevant hadoop version.
This should be packaged to a runnable jar and run against the following arguments,
- Input File Location
- Output Folder Location
- Maximum No of Mapper classes you expect to split the problem into.(optional, default = 10)
The input file is a list of numbers inside a text file(UTF8) a number per line.Numbers can be either int or double.
Code includes a Mapper, Combiner and a Reducer. Mapper split the list of numbers to a maximum of given number of classes(default 10), and handover to combiner. Combiner collapse the classes it recieve to a single key called 'Average'. Then These 'Average' keys are reduced with the Reducer to print the final output.