As we onboard more tests, some runners take much longer to run. Today, we only create one runner per architecture, but some runners (namely the nested host runners) are slower, and take a while to run the full set of tests.
We could instead improve parallelism by splitting tests such that we run some set of them say, Windows / Linux / Other, onto their own pool and having three child jobs instead. This would increase the number of agents we need per CI run, but would vastly speed up running vmm_tests.