Skip to content

dmorillas/Playwright-POC

Repository files navigation

Trace Debugger POC

This proof-of-concept helps debug Playwright test failures using LLMs.

💡 Overview

This tool parses Playwright trace files, summarizes key events, and asks GPT-4 to analyze failures and suggest fixes.

Although I'm not strong on Python (I'm not sure if I'm strong at any language after so many changes during the last years) I've decided to do it in Python to implement a CLI and use OpenAI library for Python. I thought it would be more convenient and would help me to focus more in solving the actual challenge rather than being fighting with setting up Express, an HTML page,... or set up a notebook that I don't have experience with.

I just really hope that my decision was worth it :)

How to Run

  1. Create a virtual environment from the root directory:
   python -m venv venv
  1. Activate the venv:
   source venv/bin/activate
  1. Install dependencies:
  pip install -r requirements.txt
  1. Set your OpenAI API key:
   export OPENAI_API_KEY=your-key
  1. Run the script:
python main.py your-trace.zip

Output example

This is an example of the output generated by this solution:

1. Trace Analysis:

The error occurred during the execution of an 'expect' function in your Playwright test. The 'expect' function was expecting a value to be less than 200, but the received value was exactly 200. This caused the 'expect.toBeLessThan' assertion to fail, which in turn caused the test to fail.

The error occurred in the file '/checkly/functions/src/2024-09/node_modules/vm2/lib/bridge.js' at line 485, column 11, in the function 'VM2 Wrapper.apply'. This function is part of the VM2 module, which is a sandbox that can run untrusted code securely.

The error message also mentions the file '/check/7cc47aae-3f06-44c2-b9b0-af0185948b94/script.js' at line 8, column 83. This is likely the location in your test script where the 'expect' function was called.

2. Root Cause:

The root cause of the failure is that the 'expect.toBeLessThan' assertion was expecting a value less than 200, but it received a value of 200. This means that the value being tested was not less than 200, as expected.

3. Suggestions for Fixing the Issue:

You should review the logic of your test to determine why the value being tested was not less than 200. If the value being tested should indeed be less than 200, then there may be a bug in the code under test that is causing it to return or produce a value of 200.

Alternatively, if the value being tested can legitimately be 200, then you may need to change your test assertion to 'expect.toBeLessThanOrEqual', which will pass if the value is less than or equal to 200.

4. Further Investigation:

If changing the test assertion or fixing the code under test does not resolve the issue, you may need to debug your test to understand why the value being tested is 200. This could involve adding console.log statements to your test to print out the value being tested, or using a debugger to step through your test and inspect the value being tested.

Next steps and possible production-ready preparation

  1. Technology. As I said I used Python and CLI for convenience, but this code should be migrated to the needed technology and being able to be integrated in a bigger system.
  2. OpenAI's error handling. Right now if the call to OpenAI fails is just logs and error and exits the process. I guess we would like some other management, like son retries policy.
  3. Tests. In a more production environment I'd add some tests to the logic.
  4. Playwright traces understanding. I took a look at the files in the zip, I looked for information and I checked the Trace Viewer. With that I got my conclusions of how to deal with the traces, but before going to production I'd like to get more time and talk with someone with more experience with the traces to see if the pre-processing I'm doing is the correct one or if I should do more, different or add some error handling.
  5. Prompts refinement. Given the challenge I came up with the system and user prompts I thought I would need. Then I refined these prompts by using ChatGPT and by testing, but I think I would also like to talk to someone else to refine a bit more the prompts and if there is something else we could add. Something similar to the point right above.
  6. Sync vs async. Again, since this is a simple solution of a CLI everything is synchronous, but in a production environment I'd consider if it's possible to do this asynchronous. If this is something the user does not need an immediate response, a message could be sent to a SQS queue os it's process asynchronously and send a notification to the user once the response from OpenAI is received.
  7. Logging. Once this is more integrated in the whole system (as mentioned in point 1), add some logging to be able to monitor the performance of this service.
  8. Prompts and model configuration. Depending on the case, extract system and user prompts and model configuration out of the code. For instance, in the current project I'm working since, besides me, some no-developer can change them we have a "kind-of" dashboard where they can update the prompts in case of some refinement is needed. With that, the change can be done without a server deployment.
  9. More models. Right now this solution only works with OpenAI models, but not sure if it would make sense to make it more flexible to other LLMs like Google VertexAI, Perplexity,...

I'm pretty sure there are way more, but this is what came up to my mind.

See you soon

I hope you like enough my solution to move forward in the process. See you soon! 👋🏻

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages