Skip to content

Commit

Permalink
drm/amdkfd: restore userptr ignore bad address error
Browse files Browse the repository at this point in the history
The userptr can be unmapped by application and still registered to
driver, restore userptr work return user pages will get -EFAULT bad
address error. Pretend this error as succeed. GPU access this userptr
will have VM fault later, it is better than application soft hangs with
stalled user mode queues.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
  • Loading branch information
PhilipYangA committed Oct 22, 2021
1 parent 57ca141 commit 2a417b7
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 10 deletions.
27 changes: 17 additions & 10 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
Original file line number Diff line number Diff line change
Expand Up @@ -2456,18 +2456,25 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
/* Get updated user pages */
ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
if (ret) {
pr_debug("%s: Failed to get user pages: %d\n",
__func__, ret);
pr_debug("Failed %d to get user pages\n", ret);

/* Return -EFAULT bad address error as success. It will
* fail later with a VM fault if the GPU tries to access
* it. Better than hanging indefinitely with stalled
* user mode queues.
*
* Return other error -EBUSY or -ENOMEM to retry restore
*/
if (ret != -EFAULT)
return ret;
} else {

/* Return error -EBUSY or -ENOMEM, retry restore */
return ret;
/*
* FIXME: Cannot ignore the return code, must hold
* notifier_lock
*/
amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
}

/*
* FIXME: Cannot ignore the return code, must hold
* notifier_lock
*/
amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm);
#else
if (!mem->user_pages) {
mem->user_pages =
Expand Down
3 changes: 3 additions & 0 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
Original file line number Diff line number Diff line change
Expand Up @@ -768,6 +768,9 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
r = amdgpu_hmm_range_get_pages(&bo->notifier, mm, pages, start,
ttm->num_pages, &gtt->range, readonly,
false, NULL);
if (r)
pr_debug("failed %d to get user pages 0x%llx\n", r, start);

out_putmm:
mmput(mm);

Expand Down

0 comments on commit 2a417b7

Please sign in to comment.