-
Notifications
You must be signed in to change notification settings - Fork 355
SetUp
- Download attachment:HowToOptimizeGemm.tar.gz (make sure the file is stored as OptimizeGemm.tar.gz)
- Uncompress by executing
gunzip HowToOptimizeGemm.tar.gz
- Expand the tar file by executing
tar HowToOptimizeGemm.tar
- Change into the directory that is created by executing
cd HowToOptimizeGemm
In the directory HowToOptimizeGemm
you will find the following files
that you will use to systematically optimize the matrix-matrix multiplication
operation:
-
makefile
The makefile that describes how to compile, link, and execute the driver/implementations. When you type `make', this file is consulted and commands in it are executed. Note: there are "tab" characters in the makefile. These are important... -
Test driver
-
parameters.h
File that holds parameters that control what data is collected -
test_MMult.c
Driver routine that tests and times the different implementations. This routine executes a reference implementation and the current optimization to be timed. Parameters for this routine are initialized inparameters.h
. In particular, in that file it is indicated how many times to repeat each experiment (problem size) and how each of the three dimensionsm
,n
, andk
are tied to the problem size being timed.
-
-
Matrix multiplication implementations
-
REF_MMult.c
Reference implementation used to check correctness -
MMult0.c
Version 0: simplest implementation -
MMult1.c
Optimization 1
-
-
Utility routines
-
compare_matrices.c
Compares the contents of two matrices and returns the maximum absolute difference -
copy_matrix.c
Copies one matrix to another -
dclock.c
Returns elapsed time in seconds -
random_matrix.c
Generates a random matrix -
print_matrix.c
Prints the contents of a matrix
-
*Plotting the results
-
PlotAll.m
Plots graphs corresponding to the data in filesoutput_old.m
andoutput_new.m
-
proc_parameters.m
File in which parameters about the architecture are given
These last routines allow one to use octave to plot the performance of two different implementations.
In this exercise, we use the gcc (Gnu C) compiler with optimization level -O2. (See the makefile
.)
This is neither the best compiler nor the best optimization level. The reason is that with this compiler and
level of optimization, we have a certain level of control:
-
Had we used the intel compiler, the "simple loops" in MMult0.c would probably have yielded quite good performance: This compiler is very good at optimizing a triple loop. Try it!!!
-
Had we used -O3 (optimization level 3), the gnu compiler would have more aggressively optimized, making the step-by-step optimizations we demonstrate much less predictable.