The purpose of the IMathEngine interface is to isolate the algorithms library from the implementation of the low-level platform-dependent operations. The interface provides methods for memory management and calculations. It is used in blob, layer, and neural network objects.
The NeoML library supports various processing devices and platform technologies:
Platform | CPU | GPU |
---|---|---|
Windows | MKL | CUDA |
Linux | MKL | - |
MacOS | MKL | - |
Android | ARM Neon | Vulkan |
iOS | ARM Neon | Metal |
All you need to work with the library is creating and, once processing is completed, destroying an IMathEngine object. This section gives the general information about the math engine internals that you will not normally need to access.
The library does not access the memory directly, because it may be allocated on GPU RAM. Because of this, IMathEngine
manages the memory via special types and functions.
CMemoryHandle
is the base class for all data descriptors. An instance of this type describes a memory block of arbitrary or unknown type. In a way, this class is similar to void*
data type in C/C++.
Two classes are derived from it: CFloatHandle
and CIntHandle
; they describe memory blocks with float
and int
data, respectively.
The math engine processes only vectors of a specific kind. Both the input data and the result should be in vector form.
Any kind of data may be represented as vector: a number is a vector of only one element, a matrix is a vector that contains its data written out row-by-row, and so on for tensors of 3 or more dimensions. This concept allows the math engine to work around the particulars of memory management on different platforms including GPU.
In most cases, the math engine methods have no return value; that is, the type of return value is void
and no non-constant references or pointers are used as "out-parameters." This helps avoid unnecessary CPU-to-GPU synchronization overhead that could significantly impact performance.
Nevertheless there are cases when synchronization is needed: for example, calculations were performed on GPU but the result is required on the "main" system.
See the full list of possible situations when CPU and GPU have to be synchronized:
- memory allocation and release
- reading data from disk
- writing large blocks of data
These cases will require additional synchronization resources, which will probably reduce speed.
Note that the system has to be "warmed up" before actual processing. As many internal objects use lazy initialization, the first run of the math engine may be slower than the subsequent ones.
However, the warmup is guaranteed to be one-off: if you will call the same methods for vectors of the same size many times, the processing will only be slower at first. For example, when training or running a neural network, all required memory buffers will be created on the first run, and the other operations will run faster.
There are several ways to create or get the pointer to the created math engine.
The default math engine will be deleted automatically on unloading the library. All other math engines should be deleted after use, but before that you need to free all memory used by the math engine, deleting all blobs created for this engine (note that blobs may be stored inside layer and network objects, so all those objects should be deleted as well).
By default, when the exceptional situation occurs NeoML
functions throw std::logic_error
or std::bad_alloc
in case of memory allocation failure.
But this behavior can be changed by setting the exception handler.
// Exception handler interface
// Use it to change the program's reaction to exceptions
class NEOMATHENGINE_API IMathEngineExceptionHandler {
public:
virtual ~IMathEngineExceptionHandler();
// An error during a method call
// The default action is to throw std::logic_error
virtual void OnAssert( const char* message, const wchar_t* file, int line, int errorCode ) = 0;
// Memory cannot be allocated on device
// The default action is to throw std::bad_alloc
virtual void OnMemoryError() = 0;
};
// Set exception handler interface for whole programm
// Set this to null to use default exception handler
// Non-default handler must be destroyed by the caller after use
NEOMATHENGINE_API void SetMathEngineExceptionHandler( IMathEngineExceptionHandler* exceptionHandler );
// Get current exception handler interface
// Returns null if use default
NEOMATHENGINE_API IMathEngineExceptionHandler* GetMathEngineExceptionHandler();
In order to use non-default exception handling it's recommended to set the exception handler before the creation of math engines.
IMathEngine& GetDefaultCpuMathEngine();
Returns a math engine working on CPU that uses only one processing thread and has no memory limitations.
This math engine does not need to be deleted after use (the memory and resources will be freed up automatically on unloading the library).
IMathEngine* GetRecommendedGpuMathEngine( size_t memoryLimit );
Creates a math engine working on the recommended GPU. If no GPUs are available null
will be returned.
- memoryLimit - the memory limitation for the math engine. Set to
0
to use all available memory.
This math engine should be deleted after use.
IMathEngine* CreateCpuMathEngine( size_t memoryLimit );
Creates a math engine working on CPU, setting the memory limitation, and the custom exception handler.
- memoryLimit - the memory limitation for the math engine. Set to
0
to use all available memory.
This math engine should be deleted after use.
IMathEngine* CreateGpuMathEngine( size_t memoryLimit );
Creates a math engine working on GPU, setting the memory limitation and the custom exception handler.
- memoryLimit - the memory limitation for the math engine. Set to
0
to use all available memory.
This math engine should be deleted after use.
Use the GPU manager to get the information about available GPUs and create a math engine working on one of them.
The manager is represented by the interface:
class IGpuMathEngineManager {
public:
// Get the number of available GPUs
virtual int GetMathEngineCount() const = 0;
// Get the information about the GPU with the specified index
// index can be from 0 to GetMathEngineCount() - 1.
virtual void GetMathEngineInfo( int index, CMathEngineInfo& info ) const = 0;
// Create a math engine on the GPU with the specified index
// index can be from 0 to GetMathEngineCount() - 1.
// memoryLimit is the memory limitation; if the limit is exceeded IMathEngineExceptionHandler::OnMemoryError() will be thrown
virtual IMathEngine* CreateMathEngine( int index, size_t memoryLimit ) const = 0;
};
Create or destroy the manager object:
// Creates a GPU manager
IGpuMathEngineManager* CreateGpuMathEngineManager();
// Destroys the GPU manager
void DestroyGpuMathEngineManager( IGpuMathEngineManager* manager );
Any math engine created via the manager should be deleted after use.