Skip to content

Local Shortcut Variables: When to Use a Value Copy and When to Use a Reference

amirroth edited this page Sep 25, 2022 · 6 revisions

EnergyPlus has a large and deep state data structure and it is often convenient to use local variables to create shortcuts into parts of that data structure. When should these shortcuts be values vs. references (or pointers, a reference a pointer are the same exact thing just with different syntax). Right now it appears that many existing shortcut variables that were created during the transition to the state structure are references and the rationale behind this is "Why create a local copy if you don't have to? The compiler will create a local copy if it needs one". This explanation is true in a high-level sense, you certainly don't want to unnecessarily copy data and yes the compiler can create local copies of data and optimize around them. However, it misses four important nuances:

  • When you create a reference/pointer, you are also copying something locally, you are just copying a different thing, the address vs. the value.
  • Reading/using a value through a reference/pointer is more expensive than reading/using a value copy if the compiler has not optimized the access.
  • References/pointers disrupt the compiler's ability to optimize.
  • The difference between scalar and non-scalar variables. As always, the nuances are often more important than the general high-level rule and in this case they certainly are.

As with most things pertaining to programming and why idiom X is better than idiom Y, it helps to understand something about how X and Y translate to machine code and how the processor will execute that machine code.

Registers and Memory

The most important thing to understand here is the difference between the two types of storage the processor deals with: register and memory. [Ed: Before you say "what about disk/network/random-IO-device?" you should know that the processor has no idea that these things exist--these are BIOS/operating system constructs that to the processor look like memory.]

Registers are the fastest kind of memory. In modern processors--pretty much every processor built since the early 1980s--the computation path is layed out in such a way that reading and writing registers essentially has a cost of zero. Part and parcel of this cost is that registers are also the only type of storage on which the computation path can operate directly. The processor can add two register values and store the result in a third register. It can read the value of a register and decide whether to branch or not. The processor cannot directly add a memory value to a register value and store the result in a register. To achieve that effect, it has to perform two steps: i) read the memory value into a register (incidentally, to do this the address of the value already has to be in a register), ii) do a register-register add. Depending on the processor, the cost of reading something from memory into a register is something between 1 and 4--if the memory location happens to be in the on-chip cache, which it will be the majority of the time--and of course there is also the cost of executing the additional instruction.

Of course, the number of registers is limited. The x86_64 architecture has 16 64-bit general purpose registers [Ed: We are going to ignore SSE registers for now], meaning that at any point in time the compiler only has 16 values on which it can tell the processor to operate directly. If it wants more values, it has to shuttle values back and forth between the registers and memory. Meanwhile, the amount of memory available to the compiler is essentially unlimited, 2^64 bytes. [Ed: Of course, the computer doesn't actually have this much memory, but the operating system implements what is called "virtual memory" which makes it look like it does.] [Ed2: Incidentally, this is the meaning of "64-bit architecture", i.e., memory addresses are 64-bits, meaning that the compiler thinks that there are 2^64 bytes worth of memory and that registers are 64-bit wide so that they can hold addresses.]

Value/Copy and Reference/Pointer

Now that we know this about registers and memory, we can think about what value and reference variables look like to the processor. Let's look at this code, a simplified version of UpdateElectricBaseboard:

void UpdateElectricBaseboard(EnergyPlusData &state, int baseboardNum)
{
   auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
   Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;

   thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
   thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
   thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
}

state and baseboardNum are parameters to the function and so the address of the state structure will be in register R1 and the value of baseboardNum will be in regiser R2 when the function body is invoked. In this example, the local shortcut thisBaseboard is created as a reference, and it should be clear why. Two bad things would happen if it were created as a copy. First, a struct is too big to fit into a register so a copy of it would be made on the stack, i.e., function local memory--this includes copies of all members that are not needed in this function. Worse, it's not like this local copy will save having to load struct members that are used in this function into registers when they are needed, they would just be loaded from a different part of memory, the stack vs. the "heap" where state resides. Second and more importantly, the struct values will only be updated locally not in state as they should be. However, also in this example the shortcut TimeStepSys is created as a value copy. Here is what this code will translate into (pseudo-assembly, not x86_64, but close enough).

   // auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
   LOAD R1, 208 -> R3    // Load state.dataElectBaseboardRad into R3. Reuse R3 since don't need to access dataHVACGlobal again
   LOAD R3, 0 -> R3      // Load R3->ElecBaseboard into R3. Reuse R3 again
   MULT R2, 40 -> R4     // Multiply numBaseboard by size of ElecBaseboard object to get offset in array
   ADD R3, R4 -> R3      // By adding starting address of array (R3) to offset (R4), we get the address/reference to thisBaseboard into R3

   // Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
   LOAD R1, 200 -> R4    // Load state.dataHVACGlobal into R4, member dataHVACGlobal is at offset 200 in struct state
   LOAD R4, 8 -> R4      // Load R3->TimeStepSys into R4, member TimeStepSys is at offset 8 in struct HVACGlobal.  Reuse R4 since we don't need dataHVACGlobal for anything else

   // thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 80 -> R5     // Load R3.Power into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
   STORE R5 -> R3, 88    // Store R5 into R3.Energy

   // thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 96 -> R5     // Load R3.ConvPower into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour instruction
   STORE R5 -> R3, 104   // Store R5 into R3.ConvEnergy

   // thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 96 -> R5     // Load R3.RadPower into R5
   MULT R5, R4 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   // Multiply by SecInHour, this is a compile time constant so the compiler inserts it into the instruction
   STORE R5 -> R3, 104   // Store R5 into R3.RadEnergy

Now, let's look at a version of this function with the shortcut TimeStepSys declared as a reference.

void UpdateElectricBaseboard(EnergyPlusData &state, int baseboardNum)
{
   auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
   Real64 &TimeStepSys = state.dataHVACGlobal->TimeStepSys; // Reference instead of value

   thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
   thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
   thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
}

And let's look at the generated code, this time I will annotate only the lines that are different from the first example.

   // auto &thisBaseboard = state.dataElectBaseboardRad->ElecBaseboard(baseboardNum);
   LOAD R1, 208 -> R3   
   LOAD R3, 0 -> R3
   MULT R2, 40 -> R4     
   ADD R3, R4 -> R3

   // Real64 TimeStepSys = state.dataHVACGlobal->TimeStepSys;
   LOAD R1, 200 -> R4    
   ADD R4, 8 -> R4       // Member TimeStepSys is at offset 8 in dataHVACGlobal, by adding it to address of dataHVACGlobal we get address of dataHVACGlobal->TimeStepSys into R4

   // thisBaseboard.Energy = thisBaseboard.Power * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 80 -> R5     
   LOAD R4, 0 -> R6      // Load TimeStepSys into R6    
   MULT R5, R6 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5 
   STORE R5 -> R3, 88    

   // thisBaseboard.ConvEnergy = thisBaseboard.ConvPower * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 96 -> R5     
   LOAD R4, 0 -> R6      // Load TimeStepSys into R6    
   MULT R5, R6 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5    
   STORE R5 -> R3, 104   

   // thisBaseboard.RadEnergy = thisBaseboard.RadPower * TimeStepSys * DataHVACGlobals::SecInHour;
   LOAD R3, 96 -> R5    
   LOAD R4, 0 -> R6      // Load TimeStepSys into R6    
   MULT R5, R6 -> R5     // Multiply by TimeStepSys
   MULT R5, 3600 -> R5   
   STORE R5 -> R3, 104   

What happened here? Well, when TimeStepSys was declared as a value it was loaded from memory into a register once, and then used "for free" three times. When it was declared as a reference, its address was loaded (actually added, which is slightly cheaper than a load) into a register once, but then its value was loaded three times. Why could the compiler not have just loaded its value once and then reused it? Well, maybe it could have, but to do so it would have had to figure out that the value did not change between uses and the more pointers and references you use the more difficult that becomes, because the value could have been over-written through any of the pointers. This may be counter-intuitive, but making something into a reference does not give the compiler more options to optimize. In fact, it makes it more difficult for the compiler to optimize. This is the "fourth" nuance we talked about at the beginning.

And we've already seen the other three. The third nuance is the difference between scalars and non-scalars. Scalars can be copied locally into registers, non-scalars cannot be held in registers and so they are copied onto the stack and this doesn't actually save anything because rather than loading members from heap memory, you are copying heap memory to stack memory and then loading from stack memory--you've actually made things worse by introducing a memory copy.

The first and second nuances have to do with scalar values. Addresses are also scalars and so saving an address locally into a register is not really any more cheaper than saving the value, but then every time you need the value you have to do an additional load.

The Rules

So what are the rules for local shortcut variables?

  • A local shortcuts to a scalar should be a value copy unless you need to write into the scalar, in which case don't use a shortcut at all because it's confusing!
  • A local shortcut to a non-scalar, e.g., struct should be a reference to avoid making a memory copy of the struct. If you need to write into a member of the struct, it should be a plain reference, otherwise it should be a const reference.

Notice, these are basically the same rules as we use for passing parameters.

Clone this wiki locally