Performance Profiling

JVM-level performance profiling for the Spark Cyclone plugin can be achieved with the use of Java Flight Recorder, which comes with the JDK.

Hadoop Configuration

The JFR files that are generated from a performance profiling are stored in the Hadoop appcache directory corresponding to the Spark job after a run. By default, this directory is immediately cleared after the job completes. The following must be added to the Hadoop configuration (/opt/hadoop/etc/hadoop/yarn-site.xml) so that the appcache data is retained for a set amount of time after job completion:

    <property>
        <name>yarn.nodemanager.delete.debug-delay-sec</name>
        <value>600</value>
    </property>

The Hadoop cluster needs to be restarted (as user hadoop) after the configuration is set:

# Shut down the cluster
$ su hadoop /opt/hadoop/sbin/stop-yarn.sh
$ su hadoop /opt/hadoop/sbin/stop-dfs.sh

# Restart the cluster
$ su hadoop /opt/hadoop/sbin/start-yarn.sh
$ su hadoop /opt/hadoop/sbin/start-dfs.sh

Note that the shutdown scripts may run successfully without actually shutting down the cluster, so it may be useful to verify this with the jps command and kill directly with kill -9:

$ jps
31489 ResourceManager
34161 SecondaryNameNode
33378 NameNode
31818 NodeManager
33663 DataNode

After restart, a switch back to HDFS normal mode will be needed:

$ su hadoop -- /opt/hadoop/bin/hdfs dfsadmin -safemode leave

Plugin Configuration

Copy the JFR settings file into a location that can be referenced by the Spark job. In the Spark job configuration, add the following two lines:

JFC_SETTINGS=/path/to/settings.jfc

--conf spark.driver.extraJavaOptions="-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=duration=600s,settings=$JFC_SETTINGS, filename=driver_events.jfr"
--conf spark.executor.extraJavaOptions="-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=duration=600s,settings=$JFC_SETTINGS,filename=executor.jfr"

Profile Report Collection

After the job completes, the JFR output files will be found in the appcache directory corresponding to the Spark job. They can be copied over to the current working directory as follows:

USER=# The user that kicked off the Spark job
JOB_ID=# The Spark job ID

# Copy the files over
for d in `find /home/hadoop/nm-local-dir/usercache/$USER/appcache/$JOB_ID -iname '*.jfr' | grep -v tmp | xargs dirname`; do cp $d/executor.jfr `basename $d`.jfr ; done

There will be one JFR file corresponding to each container that ran the job (including the driver).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PerformanceProfiling.md

PerformanceProfiling.md

Performance Profiling

Hadoop Configuration

Plugin Configuration

Profile Report Collection

Files

PerformanceProfiling.md

Latest commit

History

PerformanceProfiling.md

File metadata and controls

Performance Profiling

Hadoop Configuration

Plugin Configuration

Profile Report Collection