perf(selector): more caching for faster selector creation #4611

WilcoFiers · 2024-10-15T13:29:11Z

While doing some perf testing I discovered axe hits a huge performance bottleneck when there are lots of similar elements on the page. There are essentially three problems with that:

Selectors aren't cached. An element tested more than once has a selector created more than once.
When an element has no unique features, axe creates the same selector and does a QSA with it. If there are lots of those, axe ends up running the same SQA over and over. That can be avoided by caching too
When that QSA returns lots of similar elements, axe will refine the selector using parents. Each time it does that it runs .matches on the similar element with the updated selector. With many similar elements, running QSA again is much faster than matching each individually, especially so if the parent's aren't distinct either, so that the QSA cache can be used.

I've been using the following code to test this out (with different numbers in rows and cols):

<main>
  <h1>Hello World</h1>
  <table></table>
</main>
<script>
  const table = document.querySelector('table');
  for (let row = 0; row < 200; row++) {
    const tr = document.createElement('tr');
    table.appendChild(tr);
    for (let col = 0; col < 40; col++) {
      const td = document.createElement('td');
      const btn = document.createElement('button');
      td.appendChild(btn);
      tr.appendChild(td);
    }
  }
</script>

Perf test results

btns	cur	memo 1	memo 2	50	100	500	600	750	1000	2000
2000	2513	1511	469	519	192	190	195	190	188	464
2000	2585	1524	470	517	200	190	195	198	203	481
2000	2540	1501	475	525	194	203	190	191	188	468
4000	10748	6183	2500	2454	2452	1433	838	826	823	850
4000	10784	6199	2400	2516	2452	1433	839	854	845	819
4000	10701	6203	2400	2497	2442	1444	849	848	866	838
8000	43117	26398	8166	10954	10895	3680	2349	2220	3800	2176
8000	43365	26339	9747	10576	10742	3775	2313	2302	3826	2169
8000	44899	26167	9723	10615	10729	3772	2235	2210	3765	2238
12000	100254	61300	19291	20647	20623	6036	6137	6146	5974	6418
12000	100406	60615	19543	20924	20669	6122	6022	6287	6094	6108
12000	101401	61792	19522	20662	20834	5978	6170	6155	6102	6347

Numbers are taken from Audit reporter with performanceTimer: true.

cur: Current situation
memo 1: memoize doc.querySelectorAll
memo 2: memoize doc.querySelectorAll + getSelector
xxx: memoize 2 + re-run QSA if similar elements > XXX

Memory usage

I've tested this approach on 10 different websites and saw no difference in memory usage between this approach and our current approach.

Performance on real-world websites

I tested this against 20 real-world websites and saw no change in performance between them. The scenario where this matters is when there is lots of repetition in a page such as in very large tables. The important part is that improving performance on for this edge case doesn't seem to hurt performance elsewhere.

straker

Ran memory performance tests on the changes. I feared it would significantly increase memory size of the cache, but testing shows that's not the case. Even on a very complex page the increase is only a few MB.

To get the results: I ran a heap snapshot on the page after axe-core was loaded on the page, then ran axe-core (with the teardown method disabled). After results came back I ran another heap snapshot to see the difference in memory size of the cache and virtual tree. I did this 3-5 times (depending on the size of the site) and got an average of the difference. Even for a complex website with 35k+ nodes, the new changes only added ~8MB.

	Axe-core loaded size (mb)	After run size (mb)	Diff size (mb)	Axe-core loaded size (mb)	After run size (mb)	Diff size (mb)
	Before changes			After changes
https://www.amazon.com/	17.56	36.28	18.72	22.92	41.76	18.84
https://js.tensorflow.org/api/latest/	20.5	318.67	298.17	28.6	335.33	306.73

While doing some perf testing I discovered axe hits a huge performance bottleneck when there are lots of similar elements on the page. There are essentially three problems with that: 1. Selectors aren't cached. An element tested more than once has a selector created more than once. 2. When an element has no unique features, axe creates the same selector and does a QSA with it. If there are lots of those, axe ends up running the same SQA over and over. That can be avoided by caching too 3. When that QSA returns lots of similar elements, axe will refine the selector using parents. Each time it does that it runs .matches on the similar element with the updated selector. With many similar elements, running QSA again is much faster than matching each individually, especially so if the parent's aren't distinct either, so that the QSA cache can be used. I've been using the following code to test this out (with different numbers in rows and cols): ```html <main> <h1>Hello World</h1> <table></table> </main> <script> const table = document.querySelector('table'); for (let row = 0; row < 200; row++) { const tr = document.createElement('tr'); table.appendChild(tr); for (let col = 0; col < 40; col++) { const td = document.createElement('td'); const btn = document.createElement('button'); td.appendChild(btn); tr.appendChild(td); } } </script> ``` ## Perf test results | btns | cur | memo 1 | memo 2 | 50 | 100 | 500 | 600 | 750 | 1000 | 2000 |--------|--------|--------|--------|--------|-------|------|-------|------|-------|------ | 2000 | 2513 | 1511 | 469 | 519 | 192 | 190 | 195 | 190 | 188 | 464 | 2000 | 2585 | 1524 | 470 | 517 | 200 | 190 | 195 | 198 | 203 | 481 | 2000 | 2540 | 1501 | 475 | 525 | 194 | 203 | 190 | 191 | 188 | 468 | 4000 | 10748 | 6183 | 2500 | 2454 | 2452 | 1433 | 838 | 826 | 823 | 850 | 4000 | 10784 | 6199 | 2400 | 2516 | 2452 | 1433 | 839 | 854 | 845 | 819 | 4000 | 10701 | 6203 | 2400 | 2497 | 2442 | 1444 | 849 | 848 | 866 | 838 | 8000 | 43117 | 26398 | 8166 | 10954 | 10895 | 3680 | 2349 | 2220 | 3800 | 2176 | 8000 | 43365 | 26339 | 9747 | 10576 | 10742 | 3775 | 2313 | 2302 | 3826 | 2169 | 8000 | 44899 | 26167 | 9723 | 10615 | 10729 | 3772 | 2235 | 2210 | 3765 | 2238 | 12000 | 100254 | 61300 | 19291 | 20647 | 20623 | 6036 | 6137 | 6146 | 5974 | 6418 | 12000 | 100406 | 60615 | 19543 | 20924 | 20669 | 6122 | 6022 | 6287 | 6094 | 6108 | 12000 | 101401 | 61792 | 19522 | 20662 | 20834 | 5978 | 6170 | 6155 | 6102 | 6347 Numbers are taken from `Audit reporter` with `performanceTimer: true`. - cur: Current situation - memo 1: memoize doc.querySelectorAll - memo 2: memoize doc.querySelectorAll + getSelector - xxx: memoize 2 + re-run QSA if similar elements > XXX ## Memory usage I've tested this approach on 10 different websites and saw no difference in memory usage between this approach and our current approach. ## Performance on real-world websites I tested this against 20 real-world websites and saw no change in performance between them. The scenario where this matters is when there is lots of repetition in a page such as in very large tables. The important part is that improving performance on for this edge case doesn't seem to hurt performance elsewhere.

WilcoFiers and others added 4 commits October 15, 2024 15:28

perf(selector): more caching for faster selector creation

3b60805

Refine magic number

32603d0

✈️✈️✈️

bc5390c

Merge branch 'develop' into get-selector-perf-fixes

02e1aec

WilcoFiers marked this pull request as ready for review October 16, 2024 12:38

WilcoFiers requested a review from a team as a code owner October 16, 2024 12:38

straker approved these changes Oct 16, 2024

View reviewed changes

WilcoFiers merged commit 7e39874 into develop Oct 16, 2024
22 checks passed

WilcoFiers deleted the get-selector-perf-fixes branch October 16, 2024 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(selector): more caching for faster selector creation #4611

perf(selector): more caching for faster selector creation #4611

WilcoFiers commented Oct 15, 2024 •

edited

Loading

straker left a comment

perf(selector): more caching for faster selector creation #4611

perf(selector): more caching for faster selector creation #4611

Conversation

WilcoFiers commented Oct 15, 2024 • edited Loading

Perf test results

Memory usage

Performance on real-world websites

straker left a comment

Choose a reason for hiding this comment

WilcoFiers commented Oct 15, 2024 •

edited

Loading