Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(selector): more caching for faster selector creation #4611

Merged
merged 4 commits into from
Oct 16, 2024

Conversation

WilcoFiers
Copy link
Contributor

@WilcoFiers WilcoFiers commented Oct 15, 2024

While doing some perf testing I discovered axe hits a huge performance bottleneck when there are lots of similar elements on the page. There are essentially three problems with that:

  1. Selectors aren't cached. An element tested more than once has a selector created more than once.
  2. When an element has no unique features, axe creates the same selector and does a QSA with it. If there are lots of those, axe ends up running the same SQA over and over. That can be avoided by caching too
  3. When that QSA returns lots of similar elements, axe will refine the selector using parents. Each time it does that it runs .matches on the similar element with the updated selector. With many similar elements, running QSA again is much faster than matching each individually, especially so if the parent's aren't distinct either, so that the QSA cache can be used.

I've been using the following code to test this out (with different numbers in rows and cols):

<main>
  <h1>Hello World</h1>
  <table></table>
</main>
<script>
  const table = document.querySelector('table');
  for (let row = 0; row < 200; row++) {
    const tr = document.createElement('tr');
    table.appendChild(tr);
    for (let col = 0; col < 40; col++) {
      const td = document.createElement('td');
      const btn = document.createElement('button');
      td.appendChild(btn);
      tr.appendChild(td);
    }
  }
</script>

Perf test results

btns cur memo 1 memo 2 50 100 500 600 750 1000 2000
2000 2513 1511 469 519 192 190 195 190 188 464
2000 2585 1524 470 517 200 190 195 198 203 481
2000 2540 1501 475 525 194 203 190 191 188 468
4000 10748 6183 2500 2454 2452 1433 838 826 823 850
4000 10784 6199 2400 2516 2452 1433 839 854 845 819
4000 10701 6203 2400 2497 2442 1444 849 848 866 838
8000 43117 26398 8166 10954 10895 3680 2349 2220 3800 2176
8000 43365 26339 9747 10576 10742 3775 2313 2302 3826 2169
8000 44899 26167 9723 10615 10729 3772 2235 2210 3765 2238
12000 100254 61300 19291 20647 20623 6036 6137 6146 5974 6418
12000 100406 60615 19543 20924 20669 6122 6022 6287 6094 6108
12000 101401 61792 19522 20662 20834 5978 6170 6155 6102 6347

Numbers are taken from Audit reporter with performanceTimer: true.

  • cur: Current situation
  • memo 1: memoize doc.querySelectorAll
  • memo 2: memoize doc.querySelectorAll + getSelector
  • xxx: memoize 2 + re-run QSA if similar elements > XXX

Memory usage

I've tested this approach on 10 different websites and saw no difference in memory usage between this approach and our current approach.

Performance on real-world websites

I tested this against 20 real-world websites and saw no change in performance between them. The scenario where this matters is when there is lots of repetition in a page such as in very large tables. The important part is that improving performance on for this edge case doesn't seem to hurt performance elsewhere.

@WilcoFiers WilcoFiers marked this pull request as ready for review October 16, 2024 12:38
@WilcoFiers WilcoFiers requested a review from a team as a code owner October 16, 2024 12:38
Copy link
Contributor

@straker straker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran memory performance tests on the changes. I feared it would significantly increase memory size of the cache, but testing shows that's not the case. Even on a very complex page the increase is only a few MB.

To get the results: I ran a heap snapshot on the page after axe-core was loaded on the page, then ran axe-core (with the teardown method disabled). After results came back I ran another heap snapshot to see the difference in memory size of the cache and virtual tree. I did this 3-5 times (depending on the size of the site) and got an average of the difference. Even for a complex website with 35k+ nodes, the new changes only added ~8MB.

Before changes After changes
Axe-core loaded size (mb) After run size (mb) Diff size (mb) Axe-core loaded size (mb) After run size (mb) Diff size (mb)
https://www.amazon.com/ 17.56 36.28 18.72 22.92 41.76 18.84
https://js.tensorflow.org/api/latest/ 20.5 318.67 298.17 28.6 335.33 306.73

@WilcoFiers WilcoFiers merged commit 7e39874 into develop Oct 16, 2024
22 checks passed
@WilcoFiers WilcoFiers deleted the get-selector-perf-fixes branch October 16, 2024 16:15
straker pushed a commit that referenced this pull request Oct 16, 2024
While doing some perf testing I discovered axe hits a huge performance
bottleneck when there are lots of similar elements on the page. There
are essentially three problems with that:

1. Selectors aren't cached. An element tested more than once has a
selector created more than once.
2. When an element has no unique features, axe creates the same selector
and does a QSA with it. If there are lots of those, axe ends up running
the same SQA over and over. That can be avoided by caching too
3. When that QSA returns lots of similar elements, axe will refine the
selector using parents. Each time it does that it runs .matches on the
similar element with the updated selector. With many similar elements,
running QSA again is much faster than matching each individually,
especially so if the parent's aren't distinct either, so that the QSA
cache can be used.

I've been using the following code to test this out (with different
numbers in rows and cols):

```html
<main>
  <h1>Hello World</h1>
  <table></table>
</main>
<script>
  const table = document.querySelector('table');
  for (let row = 0; row < 200; row++) {
    const tr = document.createElement('tr');
    table.appendChild(tr);
    for (let col = 0; col < 40; col++) {
      const td = document.createElement('td');
      const btn = document.createElement('button');
      td.appendChild(btn);
      tr.appendChild(td);
    }
  }
</script>
```


## Perf test results

| btns | cur | memo 1 | memo 2 | 50 | 100 | 500 | 600 | 750 | 1000 |
2000

|--------|--------|--------|--------|--------|-------|------|-------|------|-------|------
| 2000 | 2513 | 1511 | 469 | 519 | 192 | 190 | 195 | 190 | 188 | 464
| 2000 | 2585 | 1524 | 470 | 517 | 200 | 190 | 195 | 198 | 203 | 481
| 2000 | 2540 | 1501 | 475 | 525 | 194 | 203 | 190 | 191 | 188 | 468
| 4000 | 10748 | 6183 | 2500 | 2454 | 2452 | 1433 | 838 | 826 | 823 |
850
| 4000 | 10784 | 6199 | 2400 | 2516 | 2452 | 1433 | 839 | 854 | 845 |
819
| 4000 | 10701 | 6203 | 2400 | 2497 | 2442 | 1444 | 849 | 848 | 866 |
838
| 8000 | 43117 | 26398 | 8166 | 10954 | 10895 | 3680 | 2349 | 2220 |
3800 | 2176
| 8000 | 43365 | 26339 | 9747 | 10576 | 10742 | 3775 | 2313 | 2302 |
3826 | 2169
| 8000 | 44899 | 26167 | 9723 | 10615 | 10729 | 3772 | 2235 | 2210 |
3765 | 2238
| 12000 | 100254 | 61300 | 19291 | 20647 | 20623 | 6036 | 6137 | 6146 |
5974 | 6418
| 12000 | 100406 | 60615 | 19543 | 20924 | 20669 | 6122 | 6022 | 6287 |
6094 | 6108
| 12000 | 101401 | 61792 | 19522 | 20662 | 20834 | 5978 | 6170 | 6155 |
6102 | 6347

Numbers are taken from `Audit reporter` with `performanceTimer: true`.

- cur: Current situation
- memo 1: memoize doc.querySelectorAll
- memo 2: memoize doc.querySelectorAll + getSelector
- xxx: memoize 2 + re-run QSA if similar elements > XXX

## Memory usage

I've tested this approach on 10 different websites and saw no difference
in memory usage between this approach and our current approach.

## Performance on real-world websites

I tested this against 20 real-world websites and saw no change in
performance between them. The scenario where this matters is when there
is lots of repetition in a page such as in very large tables. The
important part is that improving performance on for this edge case
doesn't seem to hurt performance elsewhere.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants