Skip to content

Conversation

@jianlinliu
Copy link
Contributor

@jianlinliu jianlinliu commented Nov 3, 2025

extract operator Progressing / Degraded Counts and Timing from intervals, collect them and save them into a auto data loader json file for historical analysis.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 3, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 3, 2025

@jianlinliu: This pull request references TRT-2254 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-trt
Copy link

openshift-trt bot commented Nov 3, 2025

Job Failure Risk Analysis for sha: f01fda0

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (106) are below the historical average (1798): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (105) are below the historical average (3244): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (106) are below the historical average (1801): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 IncompleteTests
Tests for this run (101) are below the historical average (3006): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn IncompleteTests
Tests for this run (103) are below the historical average (3313): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi IncompleteTests
Tests for this run (103) are below the historical average (3351): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@openshift-trt
Copy link

openshift-trt bot commented Nov 3, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: 11e2b1c

  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer cleanup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer collection" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer interval construction" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer preparation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer setup" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer test evaluation" [Total: 12, Pass: 12, Fail: 0, Flake: 0]
  • "[Monitor:operator-state-metrics-analyzer][Jira:"Test Framework"] monitor test operator-state-metrics-analyzer writing to storage" [Total: 12, Pass: 12, Fail: 0, Flake: 0]

@jianlinliu
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fe5001a0-b94a-11f0-82fa-e1d6c7ee712e-0

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 5, 2025

@jianlinliu: This pull request references TRT-2254 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

extract operator Progressing / Degraded Counts and Timing from intervals, collect them and save them into a auto data loader json file for historical analysis.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jianlinliu
Copy link
Contributor Author

/test unit

@jianlinliu
Copy link
Contributor Author

/test e2e-aws-ovn-microshift-serial

@jianlinliu
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2025

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3a563760-b9ea-11f0-8835-258702823538-0

@jianlinliu
Copy link
Contributor Author

/test e2e-gcp-ovn

@jianlinliu
Copy link
Contributor Author

jianlinliu commented Nov 5, 2025

From the metrics autodl json file, it was generated as expectation.

if len(metrics) > 0 {
rows := generateRowsFromMetrics(metrics)
dataFile := dataloader.DataFile{
TableName: "operator_state_metrics",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if instead of the generic "Metric" we should have defined "Count", "TotalSeconds" and maybe "MinSeconds" and "MaxSeconds" instead of "IndividualDurationSeconds". Will see if others have thoughts on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consensus was to make this a single row per operator/condition tracking

"Count", "TotalSeconds" and "MaxIndividualDurationSeconds"

if err := dataloader.WriteDataFile(fileName, dataFile); err != nil {
return fmt.Errorf("failed to write operator state metrics: %w", err)
}
fmt.Printf("--->Write operator state metrics to %s successfully.\n", fileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to encourage the use of logrus.Infof for this, clean syntax and ensures we get timestamps for debugging purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually that line was added for debugging, sure, I will update it to use logrus.Infof.

@jianlinliu
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/08e8c8c0-baf6-11f0-985c-0d8c3a5eab3d-0

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller November 6, 2025 11:09
@openshift-trt
Copy link

openshift-trt bot commented Nov 6, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: b7e34ea

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] NoExecuteTaintManager Multiple Pods [Serial] only evicts pods without tolerations from tainted nodes" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] NoExecuteTaintManager Single Pod [Serial] pods evicted from tainted nodes have pod disruption condition" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Serial]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] CSI Mock selinux on mount metrics and SELinuxWarningController SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] warning is bumped on two Pods with a different context on RWO volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [FeatureGate:SELinuxChangePolicy] [Beta] [Feature:SELinuxMountReadWriteOncePodOnly]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] CSI Mock selinux on mount metrics and SELinuxWarningController SELinuxMount metrics [LinuxOnly] [Feature:SELinux] [Serial] warning is bumped on two Pods with different policies on RWO volume [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [FeatureGate:SELinuxChangePolicy] [Beta] [Feature:SELinuxMountReadWriteOncePodOnly]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics Ephemeral should create volume metrics with the correct BlockMode PVC ref" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics PVC should create volume metrics in Volume Manager" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics PVC should create volume metrics with the correct FilesystemMode PVC ref" is a new test, and was only seen in one job.

New tests seen in this PR at sha: b7e34ea

  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] server supports sending resources in Table format" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by client-go's List method when WatchListClient is enabled" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] API Streaming (aka. WatchList) [FeatureGate:WatchList] [Beta] [Serial] should NOT be requested by metadata client's List method when WatchListClient is enabled" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-api-machinery] CBOR [Feature:CBOR] clients remain compatible with the 1.17 sample-apiserver [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-api-machinery] Namespaces [Serial] should always delete fast (ALL of 100 namespaces in 150 seconds) [Feature:ComprehensiveNamespaceDraining]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-apps] Daemon set [Serial] should not update pod when spec was updated and update strategy is OnDelete" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-apps] Daemon set [Serial] should surge pods onto nodes when spec was updated and update strategy is RollingUpdate" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-apps] DisruptionController evictions: maxUnavailable deny evictions, integer => should not allow an eviction [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-apps] Job should run a job to completion with CPU requests [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-network] IngressClass [Feature:Ingress] should choose the one with the later CreationTimestamp, if equal the one with the lower name when two ingressClasses are marked as default [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-network] IngressClass [Feature:Ingress] should set default value on new IngressClass [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-network] Networking should allow creating a Pod with an SCTP HostPort [LinuxOnly] [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-network] Services should allow creating a basic SCTP service with pod and endpoints [LinuxOnly] [Serial]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-node] NoExecuteTaintManager Multiple Pods [Serial] only evicts pods without tolerations from tainted nodes" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] NoExecuteTaintManager Single Pod [Serial] pods evicted from tainted nodes have pod disruption condition" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-node] Pod Level Resources [Serial] [Feature:PodLevelResources] [FeatureGate:PodLevelResources] [Beta] Burstable QoS pod, 1 container with resources" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Serial]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-scheduling] SchedulerPredicates [Serial] validates local ephemeral storage resource limits of pods that are allowed to run" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[sig-scheduling] SchedulerPredicates [Serial] validates that taints-tolerations is respected if matching" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • (...showing 20 of 31 tests)

Schema: map[string]dataloader.DataType{
"Operator": dataloader.DataTypeString,
"State": dataloader.DataTypeString,
"Count": dataloader.DataTypeFloat64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about DataTypeInteger for Count, TotallSeconds and MaxIndividualDurationSeconds? I don't think we are worried about precision beyond second.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking this actually. Count would make sense as an integer. Our intervals show the time as whole seconds but it makes sense for the data we collect to be more precise so leaving the time values as Float64 is probably good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Make sense.

@jianlinliu
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

@jianlinliu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ea266120-bb10-11f0-9b5f-81cec0dd33d4-0

@openshift-trt
Copy link

openshift-trt bot commented Nov 6, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 6c5eecb

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Serial]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics Ephemeral should create volume metrics with the correct BlockMode PVC ref" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics PVC should create volume metrics in Volume Manager" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2 Medium - "[sig-storage] [Serial] Volume metrics PVC should create volume metrics with the correct FilesystemMode PVC ref" is a new test, and was only seen in one job.

New tests seen in this PR at sha: 6c5eecb

  • "[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Serial]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-storage] [Serial] Volume metrics Ephemeral should create volume metrics with the correct BlockMode PVC ref" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-storage] [Serial] Volume metrics PVC should create volume metrics in Volume Manager" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-storage] [Serial] Volume metrics PVC should create volume metrics with the correct FilesystemMode PVC ref" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@neisw
Copy link
Contributor

neisw commented Nov 6, 2025

/retest-required

@neisw
Copy link
Contributor

neisw commented Nov 7, 2025

/test e2e-aws-ovn-fips

@petr-muller
Copy link
Member

/test e2e-aws-ovn-microshift

@neisw
Copy link
Contributor

neisw commented Nov 7, 2025

/retest-required
/lgtm
/verfied by autodl

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jianlinliu, neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 7, 2025

@jianlinliu: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants