Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid recent MTL regression #9314

Closed
wants to merge 4 commits into from
Closed

Avoid recent MTL regression #9314

wants to merge 4 commits into from

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Jul 19, 2024

Yesterday a recent regression on MTL, documented in #9308 was addressed in #9313 by disabling CONFIG_MODULES on MTL. That approach has multiple drawbacks: (1) it's too broad, even if one wanted to disable using LLEXT modules on MTL, it would have been enough to make the only module component DRC built-in again. (2) PR #9116 that was a suspect in this regression passed daily tests after merging and PR testing on the next day after its merging were passing too, (3) LLEXT has a high cost of maintenance without CI testing, so removing it from the CI returns it to the state where it has to be checked manually for regressions. This PR presents an alternative "fix" for the problem - reverting latest commits which triggered the breakage as identified by bisection.

@lyakh
Copy link
Collaborator Author

lyakh commented Jul 19, 2024

turns out zephyrproject-rtos/zephyr#76046 has already been negatively tested by #9309

This reverts commit d629e52.
First of the 3 commits, that need to be reverted.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This reverts commit ffce2cb.
Second of the 3 commits, that need to be reverted.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
…rams"

This reverts commit eda6029.
Third of the 3 commits, that need to be reverted.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
This reverts commit 8847de0.
CONFIG_MODULES should work again on MTL.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh lyakh marked this pull request as ready for review July 19, 2024 08:51
@lyakh lyakh changed the title Check if Zephyr PR 76046 fixes a recent MTL regression Fix a recent MTL regression Jul 19, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 19, 2024

As long as MTL still boots then fine by me!

Note it must be made clear that these reverted commits PASSED CI in #9238! They passed in isolation though, now they are combined with other changes including a big Zephyr update.

This complexity and the very complex issue #9268 are the reasons why I didn't try bisecting: I didn't expect it would be that "easy"

Not testing LLEXT on MTL is bad but not testing MTL at all (because... it does not boot!) is much worse.

There's also a very unusual failure on ADL but it does not seem related https://sof-ci.01.org/sofpr/PR9314/build6597/devicetest/index.html

EDIT: I used ssh to check the logs and that ADL crashed silently or was power-cycled. The logs stop abruptly. In the same run, a TGL system was rebooted remotely for no obvious reason. Weird.

@marc-hb marc-hb changed the title Fix a recent MTL regression Avoid recent MTL regression Jul 19, 2024
@marc-hb marc-hb requested a review from tmleman July 19, 2024 17:03
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 19, 2024

Actually both this PR and disabling LLEXT feel like a game of whack-a-mole. Who knows what other, also totally unrelated feature will trigger this crash again.

I just verified that disabling IMR resume also avoids this crash:

Unlike disabling LLEXT or "device posture", that would seem like an actually reliable way to temporarily avoid this crash: something that would let it randomly re-appear again.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 29, 2024

@marc-hb marc-hb marked this pull request as draft July 29, 2024 23:21
@marc-hb
Copy link
Collaborator

marc-hb commented Aug 5, 2024

@marc-hb marc-hb closed this Aug 5, 2024
@lyakh lyakh deleted the bug9308 branch August 13, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants