University of Sherbrooke's winning miner of CSCoins for the CSGames 2017 competition, which was a competition that began before the event and lasted the whole week-end.
(skip this section if you know how the CSCoins miners work)
This challenge is very similar in nature to how Bitcoins are mined (except that the CSCoins had a central authority). We are given a random challenge to solve where the best we can do is bruteforce possible solutions until we find one that works. When a solution is found and provided to the central authority, we consider the block of the challenge to be mined and no other solutions to the challenge will be accepted by the server. Then, a new challenge is generated where the miners must race to find a solution. The objective is to optimize the miner's attempts per second in order to maximize the probability of mining a block before the other miners.
All challenges have the same structure:
- We are given information about the challenge by the server (type and difficulty, to name some)
- We have control over a nonce (an integer, essentially) to generate a potential solution to the challenge (this is what we are bruteforcing)
- The nonce is used to generate a seed to a pseudo-random number generator
- The sequence of values generated by the pseudo-random number generator tells us what the challenge is (depending on the type of challenge, more details below)
- A potential solution (as a string) to the challenge is found by the miner
- The obtained string is hashed and the starting bytes of the hash are compared to the target hash prefix given by the server
- If they match, we mined the block!
- If they don't, we increment our nonce and try again
The three challenge types were the folllowing:
The server gives us a nb_elements
parameter to the challenge. We generate nb_elements
elements using our pseudo-random number generator and sort the list. The solution to the challenge is then the concatenated elements of the sorted list.
Same thing as the list sort, but sorting in descending order.
Here things get interesting! This is where we can see the real potential for future CSCoins challenges.
We are given grid_size
and nb_blockers
parameters. Those values help us generate a grid_size x grid_size
grid with up to nb_blockers
walls in it. We then must find the shortest path from a randomly generated starting position to a randomly generated ending position. The coordinates of the path are concatenated together as a string as a solution to the challenge.
More information about the challenges and specs on the CSCoins repo.
The optimized (C++) challenge solver is in the miner/cpp directory. solvers.cpp
contains the challenge solvers and grid.h
contains the pathfinding.
Since the bottleneck of the challenge really lies in bruteforcing the nonces, I decided to keep the provided Python
miner and only call optimized code when solving the challenges. Since there was a strong focus on the reliability of the miners (they were running overnight without a possibility of connecting to them) and that providing invalid solutions to challenges was risky (too many invalid solutions would result in a temporary ban), I kept the provided Python
challenge solvers to verify the solutions:
- Try many nonces for the current challenge with the optimized C++ version
- Try (in Python) the nonce solution returned by the optimized version (either a potential solution if found or the last one attempted)
- If the solution is invalid, go back to 1
Therefore, the worst case scenario (buggy corner case with the C++ solver) would be that we would be super slow at attempting to solve the current challenge, but we wouldn't provide invalid submissions. The additional cost of checking a solution in Python
is vastly beaten by the cost of finding a solution in the first place.
There was also a concern about not spending too much time in C++ land -- we need to check if another miner solved the current challenge every once in a while. This is implicitely handled by this design.
On my computer, the provided Python
code mined list sorting challenges at a speed of ~200 attempts/s. Rewriting the solvers in C++ bumped it up to ~600/s. Remembering to compile using -O2
(duh) brought it to ~5k/s. At this point, I figured it would be time to multithread.
This is an embarassingly parallelizable problem. I shied away from exploring GPU-based solutions because of the memory requirements of individual nonce attempts and the uncertainty of the hardware available during the competition. Thus, we launch as many threads as there are cores available and let them explore different nonce spaces, stopping early if an atomic boolean flag gets turned on by a winning thread. This brought us to ~12k/s on my sadly slow i3 processor.
There wasn't much that we could do with the list sorting to speed it up. std::sort
on the fully generated list proved to be faster than inserting elements as they were generated. I got a final speed of ~20-30k attempts/s on a 10 elements list locally and ~1.3-1.5k attempts/s on a 1000 elements list.
For the shortest_path
challenge, I used A*. The server verifies the solution using Dijkstra, so there were some cases where the path found by A* did not exactly match the one found by the server (same path length, different directions), because of A*'s relunctance to reduce its heuristic (example given in the code).
This would result in a different hash, so most likely an invalid solution. The fix for this was to verify that a valid solution found by A* was generated from a path that is the same as one found by Dijkstra (heuristic = 0
). Since we do not do that check often (only when A* finds a good nonce) and the path experimentally rarely mismatches, the speed gain (~4-5x) from using A* is really worth the price.
Next, profiling indicated that std::map
costs were a bottleneck, so I tried using std::unordered_map
until finally settling on std::vector
s and linear searches (much faster on our small containers, thanks to our CPU cache!)
I used std::pair
s for positions on the grid and the comparisons on those eventually were a bottleneck. Encoding them as int
s solved the issue.
Finally, as a final touchup to the performance, I turned on profile guided optimization while compiling, which instruments the code with profiling metrics once ran and can later use that performance profile when compiling to optimize based on the performance characteristics of the program. This gained a nice free ~200 attempts/s.
We won the first place in this competition. We mined 485 blocks out of the total of 816 blocks mined by various teams during the event (as can be seen on the wallet page, where our wallet ID is the one that beings with 2f
). Additionally, we got bonus coins for improving the UI/UX of the wallet.
We got a final balance of 1525 coins, where we spent 350 coins on hints/puzzle hero challenges.
std::vector
is fast on small containers- using
<algorithm>
(or the STL in general) improved the performance most of the time - profile guided optimization is sweet and pretty much free