Skip to content
This repository has been archived by the owner on Jan 26, 2023. It is now read-only.

Impact of pseudo-random number generator (PRNG) seed on qbsolv performance #70

Open
myriagon opened this issue Oct 27, 2017 · 1 comment

Comments

@myriagon
Copy link
Contributor

Thank you for accepting recent pull requests #63 and #64. #63 is a safety measure, using const with function parameters. #64 boosts performance by substituting cheaper equivalent operations for more expensive ones in evaluate().

In an attempt to quantify the impact of #64 and to examine the impact of pseudo-random number generator (PRNG) seed on qbsolv performance, I ran hundreds of tests as follows.

From random.org I obtained 800 random bytes generated by atmospheric noise and assembled them into 200 random unsigned 32-bit integers for use as PRNG seeds.

On Amazon Web Services, I used Sun Grid Engine to distribute jobs to c4.4xlarge nodes with hyperthreading turned off and 1 SGE slot per core.

for each test in qbsTestcounties, qbsTest2500, do:
    for each qbsolv version in PR63, PR64, do:
        for each seed in a set of 200 random seeds, do:
            run test
        done
    done
done

Please look at the tables and graphs of results attached below. In the graphs, tests are plotted by speed rank from no. 1, fastest, to no. 200, slowest, on the x axis, with CPU time on the y axis. Note that the y axis is logarithmic.

As you can see, in target mode the PRNG seed has a huge impact on performance. For example, in the counties tests, CPU time ranges from 84 to 4070 sec for the PR #63 version and from 68 to 3081 sec for the PR #64 version. I think I see two or possibly three inflection points in the curve, one at about speed rank 18, another at about speed rank 167, and possibly a third where the extreme outliers shoot up. What do you think might account for the shape of the curve?

In issue #62, Adam Douglass remarks that "For the QA system the marginal cost of additional samples is very small, [so] it would make sense to try to process multiple states returned for a sub-problem."

In a similar spirit, I wonder whether it would make sense to run multiple qbsolv processes concurrently with different random seeds. In target mode, a result might be found more quickly. In non-target mode, giving up before finding an optimal result might be avoided. The Python front-end could fire up and manage multiple qbsolv processes pretty easily, couldn't it?

Perf_counties_200_seeds_graph.pdf
Perf_orlib2500_200_seeds_graph.pdf
Perf_counties_200_seeds_summary_table.pdf
Perf_orlib2500_200_seeds_summary_table.pdf

@myriagon
Copy link
Contributor Author

Here is a zip file containing the Apple Numbers spreadsheets of the detailed test results, including the PRNG seeds.

Perf_qbsolv_200_seeds.zip

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants