How to write, test, and run Hadoop programs locally with IntelliJ and Maven

The following instructions allow you to write, test, and run a Hadoop program locally in IntelliJ, without configuring the Hadoop environment on your own machine or using a cluster.

This tutorial is based on Hadoop: IntelliJ结合Maven本地运行和调试MapReduce程序 (无需搭载Hadoop和HDFS环境), How-to: Create an IntelliJ IDEA Project for Apache Hadoop and Developing Hadoop Mapreduce Application within IntelliJ IDEA on Windows 10.

Requirements

IntelliJ IDEA
JDK
Linux or macOS

Instructions

Warning: Some steps and some interface details may be slightly different in your version of IntelliJ, due to developments in this program. The main ideas presented next should still be valid though.

Create a new project

In IntelliJ, Go to File, New, Project, then select Maven on the left of the pop-up window, select your JDK, and hit Next.

Set the Project name and Project location. In this tutorial, we will be "creating" the popular Hadoop example of the WordCount application from the original Hadoop MapReduce Tutorial, so use WordCountas project name. If required, fill in the GroupId (e.g., with your name) and ArtifactId (e.g., with the name of your project, i.e, WordCount in our case), then hit Finish.

Configure dependencies

A file called pom.xml should open automatically in the IntelliJ editor. If it does not, find it in the Project browser on the left, and double-click on it to open it.

Paste the following 2 blocks before the last </project> tag.

<repositories>
    <repository>
        <id>apache</id>
        <url>http://maven.apache.org</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-minicluster</artifactId>
        <version>3.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-core</artifactId>
        <version>3.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.0</version>
    </dependency>
</dependencies>

A new version of Hadoop may have come out when you read these instructions. Check the latest versions available in the Maven repository for hadoop-minicluster, hadoop-mapreduce-client-core hadoop-common, and update the version numbers above accordingly.

The full pom.xml is the following:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>yourname</groupId>
    <artifactId>Wordcount</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>14</maven.compiler.source>
        <maven.compiler.target>14</maven.compiler.target>
    </properties>
    <repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-minicluster</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.0</version>
        </dependency>
    </dependencies>
</project>

Create the WordCount class

Select the Project→src→main→java folder on the left pane, then do File, New, Java Class and use WordCount as the name of the class.

Paste the Java code into WordCount.java (this code is taken from the original Hadoop MapReduce Tutorial).

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Prepare to run

The WordCount program scans all text files in the folder specified by the first command line argument, and output the number of lines in which each word appears into a folder specified by the second command line argument.

Create a folder named input under the project's root folder (so, at the same level as the srcfolder), and drag/copy some text files inside this folder.

Then set the two command line arguments. Select Run→Edit Configurations.

Add a new Application configuration, set the Name to WordCount, set the Main class to WordCount, set Program arguments to input output. This way, the program will read the input from the input folder, and save the results to the output folder. Do not create the output folder, as Hadoop will create the folder automatically. If the folder exists, Hadoop will raise exceptions (thus, you have to manually delete the output folder before every time you run the program).

Run

Select Run→Run 'WordCount' to run the Hadoop program. If you re-run the program, delete the output folder before each run.

Results are saved in the file output/part-r-00000.

Build Runnable JAR with Dependencies

You can build a single jar file with your program and all necessary dependencies (e.g., Hadoop libraries) so you can transfer the jar file to another machine to run it.

Add the following build block to pom.xml, at the same level of the repositories block and the dependencies block.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.8.1</version>
            <configuration>
                <source>14</source>
                <target>14</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.4</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <!-- Path to your main class, include package path if needed -->
                                <mainClass>WordCount</mainClass>
                            </transformer>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Then in a terminal, cd to the directory containing the pom.xml file, and run the following command:

mvn package

This command will build WordCount-1.0-SNAPSHOT-jar-with-dependencies.jar and save it in the target directory. To run your program, execute the following command:

java -jar target/WordCount-1.0-SNAPSHOT-jar-with-dependencies.jar input output

Sample Project

See WordCount.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
WordCount		WordCount
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to write, test, and run Hadoop programs locally with IntelliJ and Maven

Requirements

Instructions

Create a new project

Configure dependencies

Create the WordCount class

Prepare to run

Run

Build Runnable JAR with Dependencies

Sample Project

About

Releases

Packages

Languages

acdmammoths/Intellij-Hadoop

Folders and files

Latest commit

History

Repository files navigation

How to write, test, and run Hadoop programs locally with IntelliJ and Maven

Requirements

Instructions

Create a new project

Configure dependencies

Create the WordCount class

Prepare to run

Run

Build Runnable JAR with Dependencies

Sample Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages