JobProgressListener

JobProgressListener is the SparkListener for web UI.

As a SparkListener it intercepts Spark events and collect information about jobs, stages, and tasks that the web UI uses to present the status of a Spark application.

JobProgressListener is interested in the following events:

A job starts.

Caution

FIXME What information does JobProgressListener track?

poolToActiveStages

poolToActiveStages = HashMap[PoolName, HashMap[StageId, StageInfo]]()

poolToActiveStages…

Caution

FIXME

Handling SparkListenerJobStart Events (onJobStart method)

onJobStart(jobStart: SparkListenerJobStart): Unit

When called, onJobStart reads the optional Spark Job group id (using SparkListenerJobStart.properties and SparkContext.SPARK_JOB_GROUP_ID key).

It then creates a JobUIData (as jobData) based on the input jobStart. status attribute is JobExecutionStatus.RUNNING.

The internal jobGroupToJobIds is updated with the job group and job ids.

The internal pendingStages is updated with StageInfo for the stage id (for every StageInfo in SparkListenerJobStart.stageInfos collection).

numTasks attribute in the jobData (as JobUIData instance created above) is set to the sum of tasks in every stage (from jobStart.stageInfos) for which completionTime attribute is not set.

The internal jobIdToData and activeJobs are updated with jobData for the current job.

The internal stageIdToActiveJobIds is updated with the stage id and job id (for every stage in the input jobStart).

The internal stageIdToInfo is updated with the stage id and StageInfo (for every StageInfo in jobStart.stageInfos).

A StageUIData is added to the internal stageIdToData for every StageInfo (in jobStart.stageInfos).

Note	`onJobStart` is a part of SparkListener contract to handle…FIXME

stageIdToInfo Registry

stageIdToInfo = new HashMap[StageId, StageInfo]

stageIdToActiveJobIds Registry

stageIdToActiveJobIds = new HashMap[StageId, HashSet[JobId]]

jobIdToData Registry

jobIdToData = new HashMap[JobId, JobUIData]

activeJobs Registry

activeJobs = new HashMap[JobId, JobUIData]

pendingStages Registry

pendingStages = new HashMap[StageId, StageInfo]

Caution

FIXME

JobUIData

Caution

FIXME

blockManagerIds method

blockManagerIds: Seq[BlockManagerId]

Caution

FIXME

Registries

stageIdToData Registry

stageIdToData = new HashMap[(StageId, StageAttemptId), StageUIData]

stageIdToData holds StageUIData per stage (given the stage and attempt ids).

StageUIData

Caution

FIXME

schedulingMode Attribute

schedulingMode attribute is used to show the scheduling mode for the Spark application in Spark UI.

Note	It corresponds to spark.scheduler.mode setting.

When SparkListenerEnvironmentUpdate is received, JobProgressListener looks up spark.scheduler.mode key in Spark Properties map to set the internal schedulingMode field.

Note	It is used in Jobs and Stages tabs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-webui-JobProgressListener.adoc

spark-webui-JobProgressListener.adoc

JobProgressListener

poolToActiveStages

Handling SparkListenerJobStart Events (onJobStart method)

stageIdToInfo Registry

stageIdToActiveJobIds Registry

jobIdToData Registry

activeJobs Registry

pendingStages Registry

JobUIData

blockManagerIds method

Registries

stageIdToData Registry

StageUIData

schedulingMode Attribute

Files

spark-webui-JobProgressListener.adoc

Latest commit

History

spark-webui-JobProgressListener.adoc

File metadata and controls

JobProgressListener

poolToActiveStages

Handling SparkListenerJobStart Events (onJobStart method)

stageIdToInfo Registry

stageIdToActiveJobIds Registry

jobIdToData Registry

activeJobs Registry

pendingStages Registry

JobUIData

blockManagerIds method

Registries

stageIdToData Registry

StageUIData

schedulingMode Attribute