Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface failures when installing App Store apps #25514

Open
1 of 21 tasks
eugkuo opened this issue Jan 16, 2025 · 13 comments
Open
1 of 21 tasks

Surface failures when installing App Store apps #25514

eugkuo opened this issue Jan 16, 2025 · 13 comments
Assignees
Labels
#g-software Software product group :product Product Design department (shows up on 🦢 Drafting board) story A user story defining an entire feature

Comments

@eugkuo
Copy link
Contributor

eugkuo commented Jan 16, 2025

Goal

User story
As an IT admin,
I want to see failures for App Store app installs
so that I can be aware of the issues and work to resolve them.

Key result

Auto-update (patch) any software without writing custom policies

Original requests

Context

Changes

Product

Engineering

  • Test plan is finalized
  • Feature guide changes: TODO
  • Database schema migrations: TODO
  • Load testing: TODO

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

  • Requires load testing: TODO
  • Risk level: Low / High TODO
  • Risk description: TODO

Test plan

MDM

  1. Turn off MDM on a host
  2. Add an app store app with automatic install turned on
  3. Check activity feed on host to see that the policy failed
  4. Click 'view details' to see that the correct modal appears
  5. Click the lean more redirects to make sure the appropriate pages show up

Lack of licenses

  1. In ABM purchase a license for a host
  2. Add the app and select auto-install for hosts
  3. Check host activity feed for failure
  4. Click the lean more redirects to make sure the appropriate pages show up

MDM off and lack of licenses

  1. Turn off MDM and add an app with a single license from ABM with automatical install on.
  2. Check activity feed on host to see that the policy failed becase MDM is off.
  3. Click 'view details' to see that the correct modal appears
  4. Turn on MDM for host and then go to the host detail page to install the app
  5. On the host detail page check the activity feed to make sure that the install failed due to lack of licenses.

Self-service MDM off

  1. Ensure MDM is off for my host
  2. Navigate to My device/Self-service
  3. See that "install" is disabled
  4. Activity feed TO DO (Is this relevant and testable?)

Self-service lack of licenses

  1. Ensure a self-service app is out of licenses
  2. Navigate to My device/Self-service
  3. Click "install" on said application
  4. Ensure error message appears
  5. Activity feed TO DO

Manual Install or API call MDM off
TO DO

Manual Install or API call lack of licenses
TO DO

Host counts (when are we showing Installed, Pending, Failed)

  • Try to install an App Store app when there are no licenses. Check to make sure the "Failed" count on Software title page is incremented. At no point is the "Pending" count incremented.

Host details activity feed

  • Try to install an App Store app when there are no licenses. Check to make sure the "failed to install" activity shows up under "Past." At no point is there an install activity under "Upcoming"

Testing notes

Confirmation

  1. Engineer: Added comment to user story confirming successful completion of test plan.
  2. QA: Added comment to user story confirming successful completion of test plan.
@eugkuo eugkuo added :product Product Design department (shows up on 🦢 Drafting board) story A user story defining an entire feature labels Jan 16, 2025
@eugkuo
Copy link
Contributor Author

eugkuo commented Jan 16, 2025

From @iansltx

Potential fixes:

  1. Add a host activity when VPP automations fail due to lack of MDM enrollment (and potentially due to lack of VPP app license)
    1. Requires design decisions + FE
      1. Activity format
      2. Which events do we capture of:
        1. MDM not enrolled
        2. VPP licenses exhausted
      3. Level of effort: low-medium
      4. Level of tech debt: low
      5. => Noah: Let’s follow-up in 4.64.
  2. Mark the policy as "---" if it fails but we can’t run the automation (similar to what we do for out-of-scope package install automations)
    1. Can probably be eng-spec’d with design sign-off
    2. Level of effort: low-medium
    3. Level of tech debt: medium (continues a confusing pattern that affects how a primitive behaves)
  3. Clear failed host policy statuses when an action is taken that would remove a barrier to automation running successfully
    1. Per-host for any VPP-automated policy: host enrolls in MDM
    2. Per-automated-app when we see available licenses go from zero to nonzero
      1. Would require a cron to ping ABM
      2. We’ll be building a VPP-adjacent cron anyway for #24222
    3. Precedent already exists for this behavior
      1. Existing behaviors clear all policy statuses, not just failed
      2. Adding or revising a software install or script automation (removing doesn’t affect)
      3. Bringing hosts into label scope for an installer
    4. Can probably be eng-spec’d with design sign-off
    5. Level of effort: medium
      1. Can split MDM enrollment and license exhaustion scenarios
      2. Level of tech debt: low
        Can do (i) && ( (ii) xor (iii) )

@eugkuo eugkuo changed the title Surface and address automatic poilicy failures when installing App Store apps Surface and address automatic policy failures when installing App Store apps Jan 16, 2025
@noahtalerman noahtalerman added Epic DO NOT USE. Auto-created by ZenHub, cannot be disabled. and removed Epic DO NOT USE. Auto-created by ZenHub, cannot be disabled. labels Jan 17, 2025
@iansltx
Copy link
Member

iansltx commented Jan 18, 2025

FWIW if we decide (likely) the scope of this ticket is solution (1) above (activities for failures), we should split the additional recovery items (again, (3) is my preference here as the admin experience is much better) into (an)other issue(s) so they don't get lost.

On the topic of activities, we have two categories of failures when installing apps: validation steps prior to queueing the install request and the install process itself. The former returns a 4xx/5xx error when calling the API on a one-off install, while the latter shows up in the activity feed. The catch with VPP installs is there are a lot more reasons for a validation error than there are for FMA/custom packages, and since policy installs are automated actions we don't have the abilit yto "just tell the API client their action has failed."

This means that adding activities for validation failures only really needs to happen for failures triggered as part of a policy automation; we don't generate an activity when someone gets a 4xx back when calling the host software install endpoint, and we should keep on not doing that; we've already notified the client of the error. Additionally, the only "validation failure" for custom package/FMA installs that's relevant is when an install is already pending, and silently dropping a duplicate install request is already the correct behavior, so we can keep the scope of this to only VPP installs only in the context of policy automations.

Other note from today's call: we'll want a different activity type under the hood for this than for standard install failures, as these failures are early enough that we don't have a command ID to associate the install with yet. We can use the same "failed to install" copy in the UI, but instead of an install ID (which gets used to fetch install details in the UI on an endpoint unrelated to activities) we can include the reason the install queue attempt failed, and can surface which policy triggered the install not only in the activity API response (which we do for software installs and script runs already) but also the UI, along with remediation instructions ("enroll the host in MDM" or "construct more pylons buy more VPP licenses.")

@eugkuo
Copy link
Contributor Author

eugkuo commented Jan 20, 2025

I've added the following to this ticket:

The copy in the modals is assuming we're going to do 3. I do have some questions on the copy, however. If we're going to do that, how often would the cron job run? Like what would an admin's expectation be? Once they've made an update to the host or purchased new licenses when would the install then happen?

@iansltx
Copy link
Member

iansltx commented Jan 20, 2025

For MDM enrollment, with some effort we can hook the enrollment process to clear failed VPP policy automations, so that part wouldn't be on a cron.

For VPP license counts, if we're going to manage those we're going to need some more UI/API changes, as we should show available license count and last-updated-at on that license count on the title page. Could make an argument for needing a resync button here but my guess is that the cron running in the background with the ability to manually trigger via the API is enough for now. Thinking we handle this hourly, which is the same frequency as policy updates normally. We would potentially make this cron interval configurable; I'll have to check what we do for other crons.

FWIW this probably makes sense to split into three tickets:

  1. Initial activity log changes, with remediation linking to this area of the docs (screenshot as it's still in a PR):
Image
  1. MDM enrollment policy clearing (and update remediation copy for that details page)
  2. VPP license acquisition policy clearing (and update remediation copy for that details page)

as items 2 and 3 are a little bit of a heavier lift to get right, and if item 1 was the only on that made 4.64 for some reason that would still be an improvement and we could do items 2/3 in any order, potentially in parallel.

Since this is already a story, we can promote this to an Epic and add those as subtasks so no big deal, but noting so we're clear that the full fix is (a much better admin experience but) nontrivial.

@iansltx
Copy link
Member

iansltx commented Jan 20, 2025

Per today's design review:

  1. Action item 1 from the above is the scope of this ticket; the other two items wiill get their own FRs and be prioritized independently (EDIT: FRs filed).
  2. Add links from modals to places useful for resolving the problem: ABM's apps page (via a learn-more-about redirect) for license exhaustion, host page from the host name, team-specific software title page from the software title.
  3. Mention in the modals that once an issue is remediated the user can install the VPP app manually, and if they're having to do this remediation for a bunch of hosts once they get the hosts/apps in the right spot see docs (link to the above subheading) for a larger workaroudn.
  4. We should link to ABM when showing the "not enough licenses" flash error on a VPP install.
  5. We are not doing in-modal retry actions.
  6. We are not adding additional activities for manually-initiated VPP installs.
  7. We are not bringing install status tweaks on the host software inventory into scope here.

@eugkuo
Copy link
Contributor Author

eugkuo commented Jan 21, 2025

I've updated the following modals with copy and links:

Added updated error messages

And this pull request for the redirect::

@eugkuo
Copy link
Contributor Author

eugkuo commented Jan 21, 2025

Updated ticke to include link to PR for the reinstalling apps redirect. Also updated dev notes to match in the figma.

@eugkuo eugkuo added the #g-software Software product group label Jan 23, 2025
@eugkuo
Copy link
Contributor Author

eugkuo commented Jan 23, 2025

Pulled this back into "In progress" in order to review error messages for self-service install.

To review:
- Self-service error messages

FYI @noahtalerman

@noahtalerman noahtalerman assigned noahtalerman and unassigned eugkuo Jan 29, 2025
@noahtalerman noahtalerman changed the title Surface and address automatic policy failures when installing App Store apps Surface and address failures when installing App Store apps Jan 30, 2025
@noahtalerman noahtalerman changed the title Surface and address failures when installing App Store apps Surface failures when installing App Store apps Jan 30, 2025
@noahtalerman
Copy link
Member

noahtalerman commented Jan 30, 2025

  • REST API changes: See PR here
  • Activity changes: See PR here

FYI @mostlikelee I ending up opening a PR for the API and activity changes. It was easier to show everything changing in one place (one PR).

Moved this story to "Ready to spec"

@noahtalerman
Copy link
Member

@mostlikelee just a reminder that this user story is ready to spec. Can you please work with @jmwatts to complete the TODOs in the "Engineering" and "Test plan" sections? Thanks!

@noahtalerman
Copy link
Member

Hey @mostlikelee just a reminder that this user story is ready to spec and estimation is tomorrow! Can you please complete the TODOs in the "Engineering" and "Test plan" sections?

@jmwatts
Copy link
Member

jmwatts commented Feb 4, 2025

@noahtalerman Just to clarify, would you like us to define expected behavior for the "Self service activity feed", "Manual Install or API call MDM off" and "Manual Install or API call lack of licenses" scenarios?

@noahtalerman
Copy link
Member

Just to clarify, would you like us to define expected behavior for the "Self service activity feed", "Manual Install or API call MDM off" and "Manual Install or API call lack of licenses" scenarios?

@jmwatts I think up to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#g-software Software product group :product Product Design department (shows up on 🦢 Drafting board) story A user story defining an entire feature
Development

No branches or pull requests

5 participants