-
Notifications
You must be signed in to change notification settings - Fork 396
Bools, Bool Arrays, and Bitfields
char
. int
. long
. unsigned
. float
. double
. bool
. One of these things is not like the others. Hint: it's bool
. All the others are hardware data types. They have standard binary representations and the processor has instructions for loading, storing, and manipulating those representations. bool
(conceptually a single bit) is a C++ data type. The processor has bitwise/boolean instructions like and
, or
, xor
, not
, etc. but they operate on 64-bit integers (or whatever the native integer width of the processor is), not on individual bits. There is no way to load or store an individual bit from or to memory. If that is what you want to do, you have to load or store the char
or int
that contains the bit and then do the bit manipulation on either end to make sure that you are only manipulating the intended bit.
With no fixed hardware representation, the representation of bool
is implementation dependent. Most compilers will implement the a bool
as a char
(g++ does). If you type sizeof(bool)
you will get 1 (byte). That may seem like a waste of seven bits, but for a single bool
it's the best you can do.
That waste of space may start to seem a little more significant if you have an array of bool
's, and so the standard template library (and EPVector and ObjexxFCL) implement arrays of bools as "packed" with each char
holding eight bool
's. This eliminates the space overhead, but unfortunately TANSTAAFL and the price for this space savings is additional operations (i.e., time) to read and write the individual bits. Here are the basic pseudo-instructions for reading a bit for element I
from array BA
into register R1
if each bool
is represented as a char
.
ADD BA, I -> R2
LOAD R2 -> R1
Here is the equivalent sequence if eight bool
's are packed into a char
.
SHIFT-RIGHT I, 3 -> R2 // find the character that contains the bit: shifting right by 3 is the faster equivalent of dividing by 8
ADD BA, R2 -> R2
LOAD R2 -> R2
AND I, 7 -> R1 // find the bit within the character: and'ing with 7 is the faster equivalent of mod'ing by 8
SHIFT-RIGHT R2, R1 -> R1
AND R1, 1 -> R1
Setting or resetting the bit takes a similar sequences of steps. The trade-off is essentially 8X in space savings for a 3X in instructions and time. Is this worth it? Maybe, maybe not. Is there a better way? There is, especially when you are dealing with multiple (arrays of) bool
's as one often does. Consider this realistic example from DataSurfaces.hh
:
Array1D<bool> SurfEMSConstructionOverrideON;
Array1D<bool> SurfEMSOverrideIntConvCoef;
Array1D<bool> SurfEMSOverrideExtConvCoef;
Array1D<bool> SurfOutDryBulbTempEMSOverrideOn;
Array1D<bool> SurfOutWetBulbTempEMSOverrideOn;
Array1D<bool> SurfWindSpeedEMSOverrideOn;
Array1D<bool> SurfWindDirEMSOverrideOn;
Array1D<bool> SurfViewFactorGroundEMSOverrideOn;
SurfWindSpeedEMSOverrideOn.allocate(NumOfSurfaces); // etc.
bool ThisSurfWindSpeedEMSOverride = SurfWindSpeedEMSOverrideOn(SurfNum); // etc.
Ignore the naming inconsistency. If these arrays are accessed in separate contexts, each access requires six instructions. If they are all accessed in the same context, some of the indexing calculations can be shared and so each array can be accessed using only two instructions once the two indices have been calculated. However, this is not likely to be the case.
Here is an alternative implementation, which also saves space and has a much faster implementation. Instead of packing bool
's corresponding to different surfaces into a single char
, we pack bool
's corresponding to different conditions for the same surface into a single char
. This is called the "bitfield" or "flags register" pattern. Ready?
class enum SurfEMSOverrideType : int {
Construction = 0,
IntConvCoef,
ExtConvCoef,
OutDryBulbTemp,
OutWebBulbTemp,
WindSpeed,
WindDir,
ViewFactorGround
};
// These are generic bit manipulation functions, they are not specific to this example
constexpr char bitf(int b) { return 1 << b; } // b is the bit position
constexpr bool read_bool(char c, int b) { return (c & bitf(b)) != 0; }
constexpr void set_bool(char c, int b) { c |= bitf(b); }
constexpr void clear_bool(char c, int b) { c &= ~bitf(b); }
Array1D<char> SurfEMSOverrides;
SurfEMSOverrides.allocate(NumOfSurfaces); // only one array
bool ThisSurfWindSpeedEMSOverride = read_bool(SurfEMSOverrides[SurfNum], static_cast<int>(SurfEMSOverrideType::WindSpeed));
What does read_bool
look like in terms of instructions?
ADD SurfEMSOverrides, SurfNum -> R1
LOAD R1 -> R1
AND R1, 32 -> R1 // Read the bit corresponding to WindSpeed, WindSpeed is bit 5, 32 is 1 << 5 which the compiler evaluated at compile time.
NOTZERO R1 -> R1
Hmmmm. Only four instructions rather than six. Why? Enumerations are compile time constants, and so because the WindSpeed
bit is always in the same position in the char
the compiler can partially evaluate the math for extracting that bit at compile time! The position of SurfNum
is not a compile time constant and so the bit manipulation math behind has to be done at runtime.
This pattern also works well if you have a bunch of bool
's in a struct
/class
. You can represent them all in a single char
(or an int
if you have more than eight).
The moral? If you find yourself using a bunch of bool
's, either in arrays or even in a struct
/class
, consider using the bitfield/flag register pattern instead.
While we are on the subject of bitfields, here is another fun trick.
enum class WinShadingType {
IntShade,
ExtShade,
SwitchableGlazing,
ExtScreen,
IntBlind,
ExtBlind,
BGBlind
};
This is a simplified version of the WinShadingType
enumeration from EnergyPlus. What if you wanted to test the shading variable to see if any type of blind was present? One way to do this is:
if (shading == WinShadingType::IntBlind ||
shading == WinShadingType::ExtBlind ||
shading == WinShadingType::BGBlind)
Assuming shading is in register R1, this compiles to something like the following sequence of pseudo-instructions:
COMPARE-EQ R1, 4 -> R2 // 4 is the value of IntBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the IntBlind comparison succeeded
COMPARE-EQ R1, 5 -> R2 // 5 is ExtBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the ExtBlind comparison hit
COMPARE-EQ R1, 5 -> R2 // 6 is BGBlind
BRANCH-IF-TRUE R2, THEN-LABEL: // branch to the "if" block since the BGBlind comparison hit
That's six instructions including three "data-dependent" branches that the processor will have a hard time predicting correctly (here is a tutorial about data-dependent branches, what makes them hard and how to minimize them). Not great!
You could also do this:
if (shading >= WinShadingType::IntBlind && shading <= WinShadingType::BGBlind)
Which would compile to this:
COMPARE-GEQ R1, 4 -> R2 // 4 is the value of IntBlind
BRANCH-IF-FALSE R2, ELSE-LABEL: // branch to the "else" block since the IntBlind comparison failed
COMPARE-LEQ R1, 6 -> R2 // 6 is BGBlind
BRANCH-IF-FALSE R2, ELSE-LABEL: // branch to the "else" block since the BGBlind comparison failed
That's somewhat better, but doing arithmetic on enum
values (i.e., relying on them to be specific values) is considered poor form by the cognescenti, i.e., StackOverflow.
Here's an even better solution however that is not frowned upon. Remember bitf
? Check this out:
if (bitf(shading) & (bitf(WinShadingFlag::IntBlind) | bitf(WinShadingFlag::ExtBlind) | bitf(ShadingFlag::BGBlind)) != 0)
Which compiles to:
SHIFT-RIGHT 1, R1 -> R2 // represent shading as a bitfield
AND R2, 112 -> R2 // 112 in binary is 01110000, i.e., the 4th, 5th, and 6th bits corresponding to IntBlind, ExtBlind, and BGBlind are set
BRANCH-IF-TRUE R2, THEN-BLOCK:
What just happened here? Basically we converted the enum
values to bitfields in a single char
, converted shading
to a bitfield in the same manner, and then tested shading
against all three possibilities at once using and
. Oh, and since all of the enum values are compile time constants, the compiler was able to evaluate this entire expression (bitf(WinShadingFlag::IntBlind) | bitf(WinShadingFlag::ExtBlind) | bitf(ShadingFlag::BGBlind)
to 112 at compile-time. Bang!