-
Notifications
You must be signed in to change notification settings - Fork 355
Optimization_1x4_9
Copy the contents of file MMult_1x4_8.c
into a file named MMult_1x4_9.c
and change the contents.
Change the first lines in the makefile
to
OLD := MMult_1x4_8
NEW := MMult_1x4_9
make run
octave:3> PlotAll % this will create the plot
This time the performance graph will look something like
We now use something called 'indirect addressing'. Notice, for example, the line
c_00_reg += a_0p_reg * *(bp0_pntr+1);
Here
*a0p_reg
holds the element A( 0, p+1 )
(yes, this is a bit confusing. A better name for the variable would be good...)
-
We want to
bp0_pntr' points to element
B( p, 0 ). Hence
bp0_pntr+1' addresses the elementB( p+1, 0 )
. There is a special machine instruction to then access the element at `bp0_pntr+1' that does not require the pointer to be updated. -
As a result, the pointers that address the elements in the columns of B only need to be updated once every fourth iteration of the loop.
Interestingly, it appears that the compiler did this optimization automatically, and hence we see no performance improvement...