Skip to content

Fix RB8 overlay audio rerun failures by making PipeWire overlay setup idempotent#295

Open
smuppand wants to merge 3 commits intoqualcomm-linux:mainfrom
smuppand:audio
Open

Fix RB8 overlay audio rerun failures by making PipeWire overlay setup idempotent#295
smuppand wants to merge 3 commits intoqualcomm-linux:mainfrom
smuppand:audio

Conversation

@smuppand
Copy link
Contributor

Problem

On overlay builds (audioreach modules present), repeated runs of AudioRecord can FAIL on RB8 because the overlay setup path restarts PipeWire every run. After the first successful setup (until reboot), subsequent systemctl restart pipewire attempts can fail/hang and cause the testcase to report FAIL even though the audio stack is otherwise usable.

What this PR changes

This PR fixes the issue #291 reported on RB8

Runner/utils/audio_common.sh

  • Make setup_overlay_audio_environment() idempotent for overlay builds:
    • Avoid unconditional PipeWire restart on every invocation (prevents RB8 “frozen” rerun behavior).
    • Keep guarded systemctl/wpctl calls using existing timeout wrappers to avoid control-plane hangs.
    • Preserve overlay requirements (DMA heap permissions) while failing only on real errors.
    • Readiness polling remains to confirm PipeWire is usable when a restart is actually needed.

Runner/suites/Multimedia/Audio/AudioRecord/run.sh

  • Keep the existing run.sh structure/behavior, but align with shared helpers:
    • Use helpers from [audio_common.sh](http://audio_common.sh/) (e.g., PipeWire source default helper where applicable).
    • Remove/avoid any duplicate helper definitions (run.sh should not redefine helpers already in [audio_common.sh](http://audio_common.sh/)).

Copy link

@bhargav0610 bhargav0610 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting PipeWire is a valid code path which must work. Please add an explicit test, restarting PipeWire and make sure that it works. Not restarting the PW is not a way to solve the issue.

fmt="$1"; dur="$2"
base="${AUDIO_CLIPS_BASE_DIR:-AudioClips}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many unrelated changes. Please clean up your commit

else
log_error "No downloader (wget/curl) available to fetch $url"
return 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mix style cleanups and actual changes. It makes it much harder to review your PR.

Update SPDX-License-Identifier tags across the repository from BSD-3-Clause-Clear
to BSD-3-Clause as requested by legal.

No functional changes.

Signed-off-by: Srikanth Muppandam <smuppand@qti.qualcomm.com>
… freezes

On overlay builds (audioreach modules present), setup_overlay_audio_environment()
was restarting pipewire every run, which can fail/hang on RB8 after the first
successful setup until reboot.

Make overlay setup idempotent:
- avoid unconditional pipewire restart on subsequent runs
- guard systemctl/wpctl calls with timeouts to prevent freezes
- keep DMA heap permission setup but fail only on real errors
- add readiness polling to confirm PipeWire is usable

This removes flaky FAILs on repeated AudioRecord runs on RB8 overlay images.

Signed-off-by: Srikanth Muppandam <smuppand@qti.qualcomm.com>
…runtime

Align AudioRecord with shared audio_common/functestlib helpers and reduce
local logic that can drift.

- use pw_set_default_source helper instead of raw wpctl set-default
- ensure alsa_pick_virtual_pcm comes from audio_common.sh (no local copy)
- replace expr-based counters with POSIX arithmetic expansion
- keep existing CLI/behavior and result/log layout unchanged

No functional change to the recording matrix/config logic beyond robustness.

Signed-off-by: Srikanth Muppandam <smuppand@qti.qualcomm.com>
@smuppand
Copy link
Contributor Author

  1. /proc/asound/pcm is hanging in-kernel
    cat /proc/asound/pcm timed out (rc=124). That’s not PipeWire anymore — it’s ALSA/ASoC stuck inside the kernel.
  2. The wedge is happening during APR/PDR/remoteproc teardown
    D-state stacks show a very specific chain:
    PID 11 (kworker) stuck in: snd_pcm_dev_disconnect → snd_card_disconnect_sync → soc_cleanup_card_resources → ... → apr_remove_device → apr_pd_status → pdr_notifier_work
    That is the kernel disconnecting the sound card / components as part of an APR/PDR event (typically triggered by ADSP crash/SSR or remoteproc stop/start).

Adjusting the order of the tests may temporarily resolve the freezing issue. qualcomm-linux/lava-test-plans#23

[<0>] snd_pcm_dev_disconnect+0x44/0x1e0 [snd_pcm]
[<0>] snd_device_disconnect_all+0x5c/0xb0 [snd]
[<0>] snd_card_disconnect.part.0+0x13c/0x2b8 [snd]
[<0>] snd_card_disconnect_sync+0x34/0x110 [snd]
[<0>] soc_cleanup_card_resources+0x28/0x2a0 [snd_soc_core]
[<0>] snd_soc_del_component_unlocked+0xc0/0x128 [snd_soc_core]
[<0>] snd_soc_unregister_component_by_driver+0x3c/0x68 [snd_soc_core]
[<0>] devm_component_release+0x14/0x20 [snd_soc_core]
[<0>] devres_release_all+0xa0/0x120
[<0>] device_unbind_cleanup+0x18/0x70
[<0>] device_release_driver_internal+0x1e4/0x21c
[<0>] device_release_driver+0x18/0x24
[<0>] bus_remove_device+0xc4/0x104
[<0>] device_del+0x148/0x40c
[<0>] device_unregister+0x14/0x34
[<0>] apr_remove_device+0x44/0x60 [apr]
[<0>] device_for_each_child+0x64/0xc0
[<0>] apr_pd_status+0x58/0x70 [apr]
[<0>] pdr_notifier_work+0x90/0xdc [pdr_interface]
[<0>] process_one_work+0x150/0x290
[<0>] worker_thread+0x2d0/0x3ec
[<0>] kthread+0x12c/0x204
[<0>] ret_from_fork+0x10/0x20
[rc=0]
\n--- /proc/1268/stack ---
\n$ timeout 1 sh -c cat /proc/1268/stack 2>/dev/null || echo 'NO STACK'
[<0>] pdr_handle_release+0x30/0xf0 [pdr_interface]
[<0>] apr_remove+0x20/0x58 [apr]
[<0>] rpmsg_dev_remove+0x38/0x60
[<0>] device_remove+0x4c/0x80
[<0>] device_release_driver_internal+0x1c4/0x21c
[<0>] device_release_driver+0x18/0x24
[<0>] bus_remove_device+0xc4/0x104
[<0>] device_del+0x148/0x40c
[<0>] device_unregister+0x14/0x34
[<0>] qcom_glink_remove_device+0x10/0x20
[<0>] device_for_each_child+0x64/0xc0
[<0>] qcom_glink_native_remove+0x104/0x270
[<0>] qcom_glink_smem_unregister+0x28/0x54 [qcom_glink_smem]
[<0>] glink_subdev_stop+0x1c/0x3c [qcom_common]
[<0>] rproc_stop_subdevices+0x3c/0x60
[<0>] rproc_stop+0x34/0x11c
[<0>] rproc_shutdown+0x58/0x140
[<0>] state_store+0xb4/0xfc
[<0>] dev_attr_store+0x18/0x2c
[<0>] sysfs_kf_write+0x7c/0x94
[<0>] kernfs_fop_write_iter+0x12c/0x200
[<0>] vfs_write+0x240/0x380
[<0>] ksys_write+0x64/0x100
[<0>] __arm64_sys_write+0x18/0x24
[<0>] invoke_syscall.constprop.0+0x40/0xf0
[<0>] el0_svc_common.constprop.0+0xb8/0xd8
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0xe8
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x19c/0x1a0
[rc=0]
\n--- /proc/1283/stack ---
\n$ timeout 1 sh -c cat /proc/1283/stack 2>/dev/null || echo 'NO STACK'
[<0>] snd_pcm_substream_proc_status_read+0x58/0x1e8 [snd_pcm]
[<0>] snd_info_seq_show+0x34/0x4c [snd]
[<0>] seq_read_iter+0x100/0x478
[<0>] seq_read+0xec/0x12c
[<0>] proc_reg_read+0x74/0xe0
[<0>] vfs_read+0xc4/0x33c
[<0>] ksys_read+0x64/0x100
[<0>] __arm64_sys_read+0x18/0x24
[<0>] invoke_syscall.constprop.0+0x40/0xf0
[<0>] el0_svc_common.constprop.0+0xb8/0xd8
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0xe8
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x19c/0x1a0
[rc=0]
\n--- /proc/2080/stack ---
\n$ timeout 1 sh -c cat /proc/2080/stack 2>/dev/null || echo 'NO STACK'
[<0>] snd_pcm_proc_read+0x30/0x104 [snd_pcm]
[<0>] snd_info_seq_show+0x34/0x4c [snd]
[<0>] seq_read_iter+0x100/0x478
[<0>] seq_read+0xec/0x12c
[<0>] proc_reg_read+0x74/0xe0
[<0>] vfs_read+0xc4/0x33c
[<0>] ksys_read+0x64/0x100
[<0>] __arm64_sys_read+0x18/0x24
[<0>] invoke_syscall.constprop.0+0x40/0xf0
[<0>] el0_svc_common.constprop.0+0xb8/0xd8
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0xe8
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x19c/0x1a0
[rc=0]
\n--- /proc/2179/stack ---
\n$ timeout 1 sh -c cat /proc/2179/stack 2>/dev/null || echo 'NO STACK'
[<0>] snd_pcm_proc_read+0x30/0x104 [snd_pcm]
[<0>] snd_info_seq_show+0x34/0x4c [snd]
[<0>] seq_read_iter+0x100/0x478
[<0>] seq_read+0xec/0x12c
[<0>] proc_reg_read+0x74/0xe0
[<0>] vfs_read+0xc4/0x33c
[<0>] ksys_read+0x64/0x100
[<0>] __arm64_sys_read+0x18/0x24
[<0>] invoke_syscall.constprop.0+0x40/0xf0
[<0>] el0_svc_common.constprop.0+0xb8/0xd8
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0xe8
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x19c/0x1a0
[rc=0]
\n--- /proc/2311/stack ---
\n$ timeout 1 sh -c cat /proc/2311/stack 2>/dev/null || echo 'NO STACK'
[<0>] snd_pcm_proc_read+0x30/0x104 [snd_pcm]
[<0>] snd_info_seq_show+0x34/0x4c [snd]
[<0>] seq_read_iter+0x100/0x478
[<0>] seq_read+0xec/0x12c
[<0>] proc_reg_read+0x74/0xe0
[<0>] vfs_read+0xc4/0x33c
[<0>] ksys_read+0x64/0x100
[<0>] __arm64_sys_read+0x18/0x24
[<0>] invoke_syscall.constprop.0+0x40/0xf0
[<0>] el0_svc_common.constprop.0+0xb8/0xd8
[<0>] do_el0_svc+0x1c/0x28
[<0>] el0_svc+0x34/0xe8
[<0>] el0t_64_sync_handler+0xa0/0xe4
[<0>] el0t_64_sync+0x19c/0x1a0

@lumag
Copy link

lumag commented Feb 16, 2026

  1. /proc/asound/pcm is hanging in-kernel
    cat /proc/asound/pcm timed out (rc=124). That’s not PipeWire anymore — it’s ALSA/ASoC stuck inside the kernel.
  2. The wedge is happening during APR/PDR/remoteproc teardown
    D-state stacks show a very specific chain:
    PID 11 (kworker) stuck in: snd_pcm_dev_disconnect → snd_card_disconnect_sync → soc_cleanup_card_resources → ... → apr_remove_device → apr_pd_status → pdr_notifier_work
    That is the kernel disconnecting the sound card / components as part of an APR/PDR event (typically triggered by ADSP crash/SSR or remoteproc stop/start).

So, is it an issue in the kernel itself or in the AudioReach drivers?

Adjusting the order of the tests may temporarily resolve the freezing issue. qualcomm-linux/lava-test-plans#23

Working around the issue would mean that we would not be able to test whether the issue is actually fixed or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants