A very simple benchmark to find the fastest browser for local scraping (just a tip of iceberg) and doesn't cover a lot (production evaluation, more complex cases, ...).
Please keep the following in mind:
-
This project was created for learning purposes.
-
All tests were conducted on a MacBook Pro (M1 Pro, 32GB RAM).
-
The benchmarks cover only a few specific scenarios.
-
These results should not be used as a definitive reference for serious performance analysis.
-
Main focus is on playwright (python) but a little twist with pydoll.
-
The code isn't perfect (it never is). If you spot an issue or have an idea for improvement (especially for performance), please feel free to open an Issue or submit a Pull Request.
- For scraping with Playwright, stick with the bundled Chromium. It consistently proves to be fast, reliable, and well-supported., looks like they know their stuff.
- Using your own installed browser is difficult. Support for using local, "branded" browser installations is limited.
- Safari: You cannot test the official Safari browser directly. Playwright doesn't work with the branded version of Safari since it relies on patches. Instead, you can test using the most recent WebKit build.
- Firefox: You must use the version Playwright installs, not your own. Zen browser is not supported because same reason. Playwright doesn't work with the branded version of Firefox since it relies on patches.
- Arc: This Chromium-based browser can work, but it has a major quirk: it will fail to connect if you already have another Arc instance running. You must close all Arc windows before running the benchmark.
The benchmark runs the complete set of tests "retries" times (10 by default). For each individual run, it measures the time taken for each step. Finally, the results are aggregated into mean values for each browser/headless combination.
The benchmark runs a simple, repeatable process to measure performance.
-
Load Configs: When you start a run, the tool first reads
test-sites.yaml(test samples). This file tells it which URLs to visit, what XPath selector to use for finding an element, and what the "correct" text should be. It also checks config.yaml to find the location of any custom browsers you're testing (this should be changed by you for you machine). -
Run Loops: The script loops through each site for the number of iterations you specified (using --retries
10 is default). -
Measure: The most important part is that every test cycle is broken down into five distinct, timed steps :
- Browser Launch: Time to open the browser process.
- New Page: Time to open a new blank tab.
- Page Navigation: Time to load the state of target URL (
await page.wait_for_load_state("load")). - Extract Content: Time to find the element and get its text, must be visible. (
await element_locator.wait_for(state="visible", timeout=15000)) - Browser Close: Time to shut down the browser process.
Tested are these browsers with headless and non-headless mode:
- Chromium (playwright)
- Firefox (playwright)
- Webkit (playwright)
- Installed (app) Helium
- Installed (app) Brave
- Installed (app) Arc
- Chromium (pydoll)
git clone
cd browser-scraping-benchmarkuv venv venv
source venv/bin/activate
pip install -e .You can now run the benchmark using the browser-scraping-benchmark command.
browser-scraping-benchmark run [BROWSER] [OPTIONS]The name of the browser to test.
| Browser Name | Description |
|---|---|
pydoll |
Runs the test using the pydoll library. |
chromium |
Runs the test using Playwright's bundled Chromium. |
firefox |
Runs the test using Playwright's bundled Firefox. |
webkit |
Runs the test using Playwright's bundled WebKit. |
Note: You can also use custom browser names (e.g.,
chrome-beta,arc) if you have defined them in yourconfig.yaml.
| Option | Description | Default |
|---|---|---|
--output <DIRECTORY> (Required) |
Specifies the folder where the JSON result files will be saved. | N/A |
--headless / --no-headless (Optional) |
Toggles headless mode. | --headless |
--retries <NUMBER> (Optional) |
The number of times to run the test suite for each browser. The results will be aggregated. | 1 |
--test-sites <FILE> (Optional) |
Path to your test sites configuration. | test-sites.yaml |
--config <FILE> (Optional) |
Path to your browser executable configuration. | config.yaml |
--headless(Default): Runs the browser without a visible UI window.--no-headless: Runs the browser with a visible UI, which can be useful for debugging.
browser-scraping-benchmark run chromium --headless --retries 10 --output data/