A Generic hardware solution to a PopulationCount algorithm, written in VHDL. Testbench and diagrams included
This device counts the number of "1"'s in a binary word, it has many applications including crytography and error detection/correction in communications
The structure of this design was inspired by a Divide and Conquer Algorithm explained on stack overflow
Here is a waveform showing what it does:
Figure 1: Behavioral Simulation Waveform
You can find an explanation of this waveform in the testing section
The source is shared on EDAplayground, where it's ready for simulation (sign in and hit the run button). A waveform should pop up after the simulation
Be sure to check the settings on the left panel, they should look like:
- Testbench + Design: VHDL
- Top entity: count_onesTB
- Tools and Simulators
- Aldec Rivera Pro
- Compile Options: 2019 -o
- [ x ] Open EPWave after run
- Aldec Rivera Pro
When the EPwave opens:
- Select Get Signals on the top left, a panel should pop up
- Select count_onesTB under scope on the left
- Select Append All on the bottom right
For more information on how to use EDAplayground please refer to the EDAplayground documentation
Compile VHDL code using ghdl
from the command line and simulate using gtkwave.
- Install ghdl
- Structure your source code in 'work' into hirearchical directories
- Use
make
to compile your code - Use
TESTBENCH=tst/... make test
to view a Gtkwave simulation like this one
Name | Purpose |
---|---|
img | Screen captures showing design & functionality |
src | The source code |
tst | Tests |
Components Available For Individual Download:
The structure of this design was inspired by a Divide and Conquer Algorithm explained on stack overflow, written in C
unsigned int count_bit(unsigned int x)
{
x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
x = (x & 0x0000FFFF) + ((x >> 16)& 0x0000FFFF);
return x;
}
Posted by user abcdabcd987 on Stack Overflow
To approach creating a scalable hardware equivalent, this algorithm can be broken up into 4 distinct problems:
- Generating the Bitmasks
- Generating a circuit to AND two sets of bits
- Generating a circuit to ADD two sets of bits
- Generating a circuit to arithmetically SHIFT(keep the sign bit when shifting) two sets of bits
Only the generation of the bitmasks will be discussed in this README, see this link for a project dedicated to the addition unit.
Once a unit can be made for one set of these operations (one line), then these units can be sequenced together.
Figure 2: Schematic For a Two-bit Population Count Circuit (equivelent to one line of c code)
Figure 3: Schematic For a Four-bit Population Count Circuit (equivelent to three lines of c code)
This design uses registers between the OP units to synchronize the circuits signals and limit timing issues such as propogation delay.
Let's inspect if there is a pattern in the bitmasks:
Mask number | HEX MASK | BIT MASK |
---|---|---|
1 | 0x55555555 | 01010101010101010101010101010101 |
2 | 0x33333333 | 00110011001100110011001100110011 |
3 | 0x0F0F0F0F | 00001111000011110000111100001111 |
4 | 0x00FF00FF | 00000000111111110000000011111111 |
5 | 0x0000FFFF | 00000000000000001111111111111111 |
From the table, we can see:
- The length of the mask is the length of the intended X input
- The number of masks corresponds with the what power of 2 creates the length of the number, in this case 2^5 = 32
- Each mask oscillates between 2^(MASK_NUMBER-1) 1's and 0's
Lets examine Mask number 3:
0000 1111 0000 1111 0000 1111 0000 1111
For the first 12 bits (from right to left), we find that indeces 0,1,2,3,8,9,10,11 all contain '1' values.
Searching this sequence on the Online Encyclopedia of Integer Sequences, we find that it is equivelent to numbers that are congruent to {0, 1, 2, 3} mod 8.
With a little more inspection the general rule looks to be that the ones are every index in which index MOD 2^(MASK_NUMBER) <= 2^(MASK_NUMBER-1)-1
Now we can generate our 2d array of masks:
gen_masks: for MASK_NUMBER in 1 to LOG_BIT_WIDTH generate
gen_bits: for BIT in (2**LOG_BIT_WIDTH)-1 downto 0 generate
one_bits: if BIT mod 2**MASK_NUMBER <= 2**(MASK_NUMBER-1)-1 generate
mask(MASK_NUMBER-1)(BIT) <= '1';
end generate one_bits;
zero_bits: if BIT mod 2**MASK_NUMBER > 2**(MASK_NUMBER-1)-1 generate
mask(MASK_NUMBER-1)(BIT) <= '0';
end generate zero_bits;
end generate gen_bits;
end generate gen_masks;
Tests for VHDL designs are usually refered to as a "Testbench". It can be thought of as a wrapper component used to drive the inputs and tap the outputs of the Unit Under Test (UUT)
We can simulate the testbench and inspect the inputs and outputs using tools available through Vivado or EDAPlayground
Designs can get large and visual inpection of a waveform can become time consuming. Thankfully a testbench can automatically check if the outputs match expected values at the right time with assert statements. If the outputs don't match expectations, we can raise alerts which will show up in the logs of the simulation tool
Figure 1: Behavioral Simulation Waveform
Figure 1 shows three scaled simulations of the design.
- A 2 bit unit
- A 4 bit unit
- A 32 bit unit
The test inputs are labeled Xone-Xthree
The test outputs are labeled COUNTone-COUNTthree
I was looking for a hardware/software internship in 2018 and my brother @ErezBinyamin offered to share my resume with his work on the condition I could pass two challanges. One was solving the first crackme challange using Ghidra, and the other was to devlop a harware equivalent of a population count algorithm from stack overflow
Both were actually really fun and I'm glad I got the chance to build something and learn along the way