Persisting Events using EventLoggingListener

EventLoggingListener is a SparkListener that logs JSON-encoded events to a file.

When enabled it writes events to a log file under spark.eventLog.dir directory. All Spark events are logged.

Note	SparkListenerBlockUpdated and SparkListenerExecutorMetricsUpdate are not logged.

Events can optionally be compressed.

In-flight log files are with .inprogress extension.

Tip	You can use History Server to view the logs using a web interface.

Note	It is a `private[spark]` class in `org.apache.spark.scheduler` package.

Tip	Enable `INFO` logging level for `org.apache.spark.scheduler.EventLoggingListener` logger to see what happens inside `EventLoggingListener`. Add the following line to `conf/log4j.properties`: `log4j.logger.org.apache.spark.scheduler.EventLoggingListener=INFO` Refer to Logging.

Creating EventLoggingListener Instance

EventLoggingListener requires an application id (appId), the application’s optional attempt id (appAttemptId), logBaseDir, a SparkConf (as sparkConf) and Hadoop’s Configuration (as hadoopConf).

Note	When initialized with no Hadoop’s `Configuration` it calls SparkHadoopUtil.get.newConfiguration(sparkConf).

Starting EventLoggingListener (start method)

start checks whether logBaseDir is really a directory, and if it is not, it throws a IllegalArgumentException with the following message:

Log directory [logBaseDir] does not exist.

The log file’s working name is created based on appId with or without the compression codec used and appAttemptId, i.e. local-1461696754069. It also uses .inprogress extension.

If overwrite is enabled, you should see the WARN message:

WARN EventLoggingListener: Event log [path] already exists. Overwriting...

The working log .inprogress is attempted to be deleted. In case it could not be deleted, the following WARN message is printed out to the logs:

WARN EventLoggingListener: Error deleting [path]

The buffered output stream is created with metadata with Spark’s version and SparkListenerLogStart class' name as the first line.

{"Event":"SparkListenerLogStart","Spark Version":"2.0.0-SNAPSHOT"}

At this point, EventLoggingListener is ready for event logging and you should see the following INFO message in the logs:

INFO EventLoggingListener: Logging events to [logPath]

Logging Event (logEvent method)

logEvent(event: SparkListenerEvent, flushLogger: Boolean = false)

logEvent logs event as JSON using org.apache.spark.util.JsonProtocol object.

Stopping EventLoggingListener (stop method)

stop closes PrintWriter for the log file and renames the file to be without .inprogress extension.

If the target log file exists (one without .inprogress extension), it overwrites the file if spark.eventLog.overwrite is enabled. You should see the following WARN message in the logs:

WARN EventLoggingListener: Event log [target] already exists. Overwriting...

If the target log file exists and overwrite is disabled, an java.io.IOException is thrown with the following message:

Target log file already exists ([logPath])

Compressing Logged Events

If event compression is enabled, CompressionCodec.createCodec(sparkConf) is executed to set up a compression codec.

Caution

FIXME What compression codecs are supported?

Settings

spark.eventLog.enabled

spark.eventLog.enabled (default: false) - whether to log Spark events that encode the information displayed in the UI to persisted storage. It is useful for reconstructing the Web UI after a Spark application has finished.

spark.eventLog.dir

spark.eventLog.dir (default: /tmp/spark-events) - path to the directory in which events are logged, e.g. hdfs://namenode:8021/directory. The directory must exist before Spark starts up. See Creating a SparkContext. * spark.eventLog.buffer.kb (default: 100) - buffer size to use when writing to output streams.

spark.eventLog.overwrite

spark.eventLog.overwrite (default: false) - whether to delete or at least overwrite an existing .inprogress log file.

spark.eventLog.compress

spark.eventLog.compress (default: false) controls whether to compress events (true) or not (false).

See Compressing Events.

spark.eventLog.testing

spark.eventLog.testing (default: false) - internal flag for testing purposes to add JSON events to the internal loggedEvents array.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-scheduler-listeners-eventlogginglistener.adoc

spark-scheduler-listeners-eventlogginglistener.adoc

Persisting Events using EventLoggingListener

Creating EventLoggingListener Instance

Starting EventLoggingListener (start method)

Logging Event (logEvent method)

Stopping EventLoggingListener (stop method)

Compressing Logged Events

Settings

spark.eventLog.enabled

spark.eventLog.dir

spark.eventLog.overwrite

spark.eventLog.compress

spark.eventLog.testing

Files

spark-scheduler-listeners-eventlogginglistener.adoc

Latest commit

History

spark-scheduler-listeners-eventlogginglistener.adoc

File metadata and controls

Persisting Events using EventLoggingListener

Creating EventLoggingListener Instance

Starting EventLoggingListener (start method)

Logging Event (logEvent method)

Stopping EventLoggingListener (stop method)

Compressing Logged Events

Settings

spark.eventLog.enabled

spark.eventLog.dir

spark.eventLog.overwrite

spark.eventLog.compress

spark.eventLog.testing