Skip to content

Latest commit

 

History

History
124 lines (76 loc) · 3.89 KB

spark-webui-JobProgressListener.adoc

File metadata and controls

124 lines (76 loc) · 3.89 KB

JobProgressListener

JobProgressListener is the SparkListener for web UI.

As a SparkListener it intercepts Spark events and collect information about jobs, stages, and tasks that the web UI uses to present the status of a Spark application.

JobProgressListener is interested in the following events:

Caution
FIXME What information does JobProgressListener track?

poolToActiveStages

poolToActiveStages = HashMap[PoolName, HashMap[StageId, StageInfo]]()

poolToActiveStages…​

Caution
FIXME

Handling SparkListenerJobStart Events (onJobStart method)

onJobStart(jobStart: SparkListenerJobStart): Unit

When called, onJobStart reads the optional Spark Job group id (using SparkListenerJobStart.properties and SparkContext.SPARK_JOB_GROUP_ID key).

It then creates a JobUIData (as jobData) based on the input jobStart. status attribute is JobExecutionStatus.RUNNING.

The internal jobGroupToJobIds is updated with the job group and job ids.

The internal pendingStages is updated with StageInfo for the stage id (for every StageInfo in SparkListenerJobStart.stageInfos collection).

numTasks attribute in the jobData (as JobUIData instance created above) is set to the sum of tasks in every stage (from jobStart.stageInfos) for which completionTime attribute is not set.

The internal jobIdToData and activeJobs are updated with jobData for the current job.

The internal stageIdToActiveJobIds is updated with the stage id and job id (for every stage in the input jobStart).

The internal stageIdToInfo is updated with the stage id and StageInfo (for every StageInfo in jobStart.stageInfos).

A StageUIData is added to the internal stageIdToData for every StageInfo (in jobStart.stageInfos).

Note
onJobStart is a part of SparkListener contract to handle…​FIXME

stageIdToInfo Registry

stageIdToInfo = new HashMap[StageId, StageInfo]

stageIdToActiveJobIds Registry

stageIdToActiveJobIds = new HashMap[StageId, HashSet[JobId]]

jobIdToData Registry

jobIdToData = new HashMap[JobId, JobUIData]

activeJobs Registry

activeJobs = new HashMap[JobId, JobUIData]

pendingStages Registry

pendingStages = new HashMap[StageId, StageInfo]
Caution
FIXME

JobUIData

Caution
FIXME

blockManagerIds method

blockManagerIds: Seq[BlockManagerId]
Caution
FIXME

Registries

stageIdToData Registry

stageIdToData = new HashMap[(StageId, StageAttemptId), StageUIData]

stageIdToData holds StageUIData per stage (given the stage and attempt ids).

StageUIData

Caution
FIXME

schedulingMode Attribute

schedulingMode attribute is used to show the scheduling mode for the Spark application in Spark UI.

Note
It corresponds to spark.scheduler.mode setting.

When SparkListenerEnvironmentUpdate is received, JobProgressListener looks up spark.scheduler.mode key in Spark Properties map to set the internal schedulingMode field.

Note
It is used in Jobs and Stages tabs.