Replies: 4 comments 1 reply
-
Love this! Will keep the feedback short because i'm sure we'll discuss in more depth soon.
|
Beta Was this translation helpful? Give feedback.
-
Recap of call with Jason:
|
Beta Was this translation helpful? Give feedback.
-
This is awesome, great work on this, @jaril! Nothing to add that wasn't already mentioned here and in today's meeting.
So exciting! It will be amazing to easily snag a reference to the last passing replay. I can't wait to have this setup for us 😍 |
Beta Was this translation helpful? Give feedback.
-
Late to the party but here are my thoughts. I think that the flow of:
Is an awesome start to the "Debugging" workflow. It provides the minimum data needed to start debugging why the failure occurred. I also like that we use the sidebar for the Test File view, because it allows the user to stay within the context of the run in the main view while digging in deeper.
This is interesting. So Cypress defined flakiness very specifically. From their docs:
It sounds like here we are defining flaky as passes and fails over time on the same branch, regardless of other variables (app code changes, test code changes, environment, type of test run, etc.) This actually may align more closely to what devs experience, but we could end up identifying "false positives" of flake if there are underlying app or test code changes. Ultimately I think an initial flakiness score for Test Suite (run pass/fail) and Test Case (test pass/fail) would be a good starting point. Both the Flaky Score and Test Runs are in the "Analysis" workflow. I'm not sold on a Test Runs view until we have more value to add for this workflow in terms of highlighting more metadata and trends (agree that it's a duplication of CI/test runner views) and I'm not sure how we would handle navigation into this view. I'm inclined to make sure we can do the "Debugging" workflow well first and better understanding what data would be helpful for an "Analysis" workflow before building. However, if we want to scaffold the view with just fails/flake and then add more metadata later, that would make sense too if there's not too much overhead. Very excited to see a v1!!! |
Beta Was this translation helpful? Give feedback.
-
Motivation
When the user is trying to debug a failing test, we aid them with action-playwright. That action records the failing tests so that they can minimize the amount of time to start debugging, and hopefully give them a better debugging experience than they would otherwise without replay.
Each failing test corresponds with a replay, so we're providing one replay's data's worth of value to the user when they go through this flow. But we can do better than that.
A useful thing for the user, if they were in that situation, is being able to compare that failing replay for that PR, with an applicable passing replay. Debugging these two things side-by-side reveal meaningful information that allows the user to identify what unexpected behavior led to the test failure.
That's where test view support come in. Currently, there's no easy way for users to find an applicable replay of a passing test, and compare it to their replay of a failing test. If we did, then users would get more value out of integrating Replay with their CI.
Addressed workflow
The general workflow here is that the user will do some action that triggers a test run. This can be initiated by any of the following:
Whenever a test run is kicked off, there's a possibility that the user will be informed that one or more of their tests have failed. As that happens, the user will want to understand why that's happening and be informed enough to decide what the next course of action is (e.g. fix the bug, file an issue, ignore).
Note that even before we make any changes, we already expose the link to the replay of the failing test. Whether that's in the logs, or in the case of a PR, as a comment.
Instead, this work's direct impact will be on what data the user has at hand to start debugging that replay of the failing test. Specifically, exposing replays of a passing test to act as a reference while debugging.
Implementation
We're going to implement two things: a test run view and a test file view.
Test Run View
This is a simple view where we filter the workspace's recordings by its Run ID. It will display the replays for all of the tests that ran.
The entry point for getting into this view is a link that's provided at the point of failure. For the current implementation, that means it will be left as log in the GitHub action itself. Actions related to PRs will additionally leave a comment on the PR itself. Those are the two places where the user can grab a link that will show them the test run view.
This view will show the failing tests first, then passing tests afterwards. For now, the point is not for this view to surface any insights and or data. Instead, it's like a lobby where you then get to decide what to investigate. In most cases, that will mean that the user will see failing tests, and will click into those failing tests.
Test File View
Once a user indicates that they'd like to learn more about a particular failing test, they can click on it and pop open the Test File view for that test. This will be additional information about the Replay, but displayed on the right hand side.
In that right sidebar, we can surface relevant links for the user to resolve this failing test. There'll be a link to the replay of the failing test, as well as a link to a replay of a passing test that's applicable for this debugging scenario:
Reasoning
This is a total hot take and you're probably thinking, "Jaril that's bananas that's not what we talked about!". Hear me out:
Omitted features
Test Runs view
This seemed to be a given, since every test runner dashboard has a reverse-chronological list of test runs and when they happened. It made sense to me to add it to this proposal initially because of that. But after more thinking, I realized that it's a duplication that has limited usefulness.
If we did have a Test Runs view, it would be a 1:1 copy of what the user would see if they were to simply go to GitHub Actions and click on their test suite. We're not providing any additional value that would make them come to Replay first for that list, instead of GitHub Actions.
There's an argument for still pushing this in since we can display a prettier representation of their test runs as compared to GitHub. We could show the pass/fail numbers for each test. But even then, I don't think that's compelling enough to be prioritized, at least for this version.
There's also a separate, somewhat superficial but more convincing argument in support of a Test Runs view — demos, screenshots, and videos. Following a link from a PR that brings you into Replay is much less sexy than going to Replay first, clicking on a test runs view, picking a test run, and going from there. It's not at all a practical user flow, but I could see the argument for getting it in just so we could have better demoware.
Action view
GitHub Actions have a SHA. Each SHA can have multiple test runs, for example, if a user re-runs the tests because they're flaky. We could explore a view where we filter by the SHA, and show the replays of the tests that correspond to it. But I didn't see any immediate, compelling benefit from that view. Typically, I just care about the latest test run. We can re-evaluate if this is something that's important later on, but for now, I'm happy with foregoing it.
Test file versioning
This would be nice, but generally an edge case and unimportant compared to getting the rest of it this feature right in the first place. It also introduces some more complexity that I'd like to sidestep if possible.
Flakiness Score
This is very important information that's relevant to the user as they're looking at the test run view and figuring out whether they should take a failing test seriously (or not).
In general, it feels like you could get a rough idea of flakiness by taking the last ~100 test runs for that file that belongs to a main branch. But that's a loaded statement — it's possible that the user has only enabled playwright tests for pushes to a PR and we don't have that data. And even if it were, there's an built-in assumption that every test run on the main branch should be passing. Which might be untrue, since it's not unheard of to push code with failing tests to main.
In any case, I could be persuaded that this is actually easier than I'm making it out to be and that we should include it in V1. But I'm erring on the side of caution to keep the core part of this out.
V2 considerations:
Additional notes
How action-playwright works
Beta Was this translation helpful? Give feedback.
All reactions