This file describes the API of the CLTune auto-tuner for OpenCL and CUDA kernels. Everything is in the cltune namespace.
-
Tuner(): Initializes a new tuner on platform 0 and device 0. -
Tuner(size_t platform_id, size_t device_id): Initializes a new tuner on platformplatform_idand devicedevice_id. For CUDAplatform_idshould be set to 0.
-
size_t AddKernel(const std::vector<std::string> &filenames, const std::string &kernel_name, const IntRange &global, const IntRange &local): Adds a new kernel to the list of tuning-kernels and returns a unique ID (to be used when adding tuning parameters). This loads one or more kernel files given by a vector offilenames. The stringkernel_namegives the entry point of the kernel. The parametersglobalandlocalrepresent the base 1D, 2D, or 3D thread configuration withlocalbeing the size of a work-group/thread-block, andglobalbeing the total amount of threads in each dimension. -
size_t AddKernelFromString(const std::string &source, const std::string &kernel_name, const IntRange &global, const IntRange &local): As above, but now the kernel is loaded from a string instead of from a file. -
void AddParameter(const size_t id, const std::string ¶meter_name, const std::vector<size_t> &values): Adds a new tuning parameter for the kernel with the givenid. The parameter has as a nameparameter_name, and a list of tuneable integer values. -
void MulGlobalSize(const size_t id, const StringRange range): Multiplies the global thread configuration for kernelidby one of the specified tuning parameters given as a 1D, 2D, or 3Drange. -
void DivGlobalSize(const size_t id, const StringRange range): As above, but global thread division instead. -
void MulLocalSize(const size_t id, const StringRange range): As above, but local thread multiplication instead. -
void DivLocalSize(const size_t id, const StringRange range): As above, but local thread division instead. -
template <typename T> void AddArgumentInput(const std::vector<T> &source)andtemplate <typename T> void AddArgumentOutput(const std::vector<T> &source)andtemplate <typename T> void AddArgumentScalar(const T argument): Functions to add kernel-arguments for input or output buffers (given asstd::vectorCPU arrays) and scalars. These should be called in the order in which the arguments appear in the kernel. -
void Tune(): Starts the tuning process after everything is set-up. This compiles all kernels and runs them for each permutation of the tuning-parameters.
-
void AddConstraint(const size_t id, ConstraintFunction valid_if, const std::vector<std::string> ¶meters): Adds a new constraint (e.g. must be equal or larger than) to the set of parameters of kernelid. The constraintvalid_ifcomes in the form of a function object which takes a number of tuning parameters, given as a vector of tuning-parameters (parameters). Their names are later substituted by actual values. -
void SetLocalMemoryUsage(const size_t id, LocalMemoryFunction amount, const std::vector<std::string> ¶meters): As above, but for local memory usage. If this method is not called, it is assumed that the local memory usage is zero: no configurations will be excluded because of too much local memory.
-
void SetReference(const std::vector<std::string> &filenames, const std::string &kernel_name, const IntRange &global, const IntRange &local): Sets the reference kernel for automatic verification purposes. Same arguments as theAddKernel()method, but in this case there can be only one reference kernel so no ID is returned. Calling this method again will overwrite the previous reference kernel. -
void SetReferenceFromString(const std::string &source, const std::string &kernel_name, const IntRange &global, const IntRange &local): As above, but now the reference kernel is loaded from a string instead of from a file. -
void AddParameterReference(const std::string ¶meter_name, const size_t value): For convenience, a tuning 'parameter'parameter_namewith a single valuevaluecan be added to the reference kernel as well. This can be useful in case the same kernel is used for tuning and as reference and certain values are not defined. It is not necessary to call this function in case a separate fully functional OpenCL or CUDA kernel is supplied.
-
void UseFullSearch(): Call this method before calling theTune()method. This will use full-search, i.e. all configurations will be tested on the device and the best-result will be found by the tuner. This is the default behaviour: it is not necessary to call this method except to override a previously set search method. -
void UseRandomSearch(const double fraction): Call this method before calling theTune()method. This will make the tuner explore only a random subset of all configurations. The size of the subset is given as the fractionfraction. For example, passing0.01will explore 1% of the search-space. -
void UseAnnealing(const double fraction, const double max_temperature): Call this method before calling theTune()method. This will make the tuner explore only a subset (size determined byfraction) of all configurations according to the simulated annealing algorithm with a maximum 'temperature' ofmax_temperature. Annealing uses randomly generated numbers, so behaviour will change from run to run. -
void UsePSO(const double fraction, const size_t swarm_size, const double influence_global, const double influence_local, const double influence_random): Call this method before calling theTune()method. This will make the tuner explore only a subset (size determined byfraction) of all configurations according to the particle swarm optimisation (PSO) algorithm with a swarm size ofswarm_sizeand fractional influence values for the global, local, and random search directions. PSO uses randomly generated numbers, so behaviour will change from run to run. -
void ModelPrediction(const Model model_type, const float validation_fraction, const size_t test_top_x_configurations): Call this method after calling theTune()method. Trains a machine learning model of typemodel_type(kLinearRegressionorkNeuralNetwork) based on the search space explored so far. Then, all the missing data-points are estimated based on this model. Following, the toptest_top_x_configurationsconfigurations are tested on the actual device. Training a model is only useful if a fraction of the search space is explored, as is the case when doing for example random-search.
-
std::unordered_map<std::string, size_t> GetBestResult(): Retrieves the parameters of the best tuning result and returns them to the caller as a map of strings (parameter names) to integers (parameter values). -
void OutputSearchLog(const std::string &filename): Outputs the search process to the filefilename. -
double PrintToScreen() const: Prints the results of the tuning to screen (stdout). Returns the best-case execution time in milliseconds. -
void PrintFormatted() const: Prints the results of the tuning to screen as a formatted table (stdout). -
void PrintJSON(const std::string &filename, const std::vector<std::pair<std::string,std::string>> &descriptions) const: Prints the results of the tuning to the filefilenamein JSON format. Additional key-value input can be given as a vector of pairs through thedescriptionsargument. -
void PrintToFile(const std::string &filename) const: Prints the results of the tuning to the filefilenamein plain text format. -
void SuppressOutput(): Disables all further printing to screen (stdout).