This project implements the Column-to-Image (Col2Im) operation for 5D tensors in C++. The Col2Im operation is a crucial component in convolutional neural networks, particularly in the backward pass of convolution layers. It transforms a column-based representation back into a spatial representation.
- Efficient C++ implementation for 5D tensors
- Supports arbitrary input shapes, block shapes, dilations, paddings, and strides
- Memory-efficient with minimal allocations
- Easy to integrate into existing C++ neural network frameworks
The core of the implementation is the Col2Im5D
class, which contains a static compute
method. This method performs the Col2Im operation with the following parameters:
columns
: Input column dataoutput
: Output tensor datainput_shape
: Shape of the original input tensor (N, C, D, H, W)block_shape
: Shape of the convolution kernel (kD, kH, kW)dilations
: Dilation factors for each spatial dimensionpads
: Padding for the beginning and end of each spatial dimensionstrides
: Stride of the convolution for each spatial dimension
Here's a basic example of how to use the Col2Im5D
class:
#include "col2im_5d.hpp"
#include <vector>
int main() {
// Define input parameters
int input_shape[] = {2, 3, 10, 10, 10}; // N, C, D, H, W
int block_shape[] = {3, 3, 3}; // kD, kH, kW
int dilations[] = {1, 1, 1};
int pads[] = {1, 1, 1, 1, 1, 1};
int strides[] = {1, 1, 1};
// Calculate sizes
int N = input_shape[0], C = input_shape[1], D = input_shape[2], H = input_shape[3], W = input_shape[4];
int kD = block_shape[0], kH = block_shape[1], kW = block_shape[2];
int out_depth = (D + pads[0] + pads[3] - (dilations[0] * (kD - 1) + 1)) / strides[0] + 1;
int out_height = (H + pads[1] + pads[4] - (dilations[1] * (kH - 1) + 1)) / strides[1] + 1;
int out_width = (W + pads[2] + pads[5] - (dilations[2] * (kW - 1) + 1)) / strides[2] + 1;
int columns_size = N * C * kD * kH * kW * out_depth * out_height * out_width;
// Create input and output data
std::vector<float> columns(columns_size);
std::vector<float> output(N * C * D * H * W);
// Perform col2im operation
Col2Im5D::compute(columns.data(), output.data(), input_shape, block_shape, dilations, pads, strides);
// Use the output...
return 0;
}
To integrate this into your project:
- Include the
col2im_5d.hpp
header in your source files. - Ensure that the input
columns
data is correctly formatted from your previous operations. - Allocate the
output
tensor before calling thecompute
method. - Call
Col2Im5D::compute
with the appropriate parameters.
- The implementation uses raw pointers for efficiency.
- A single temporary buffer is allocated for the padded output to minimize memory operations.
- Nested loops are used for clarity, but could be further optimized for specific architectures or use cases.
- SIMD vectorization for improved performance on supported architectures.
- GPU implementation using CUDA or OpenCL for parallel processing.
- Template metaprogramming for compile-time optimizations based on tensor dimensions.
Contributions to improve the implementation or extend its functionality are welcome. Please submit pull requests or open issues for any bugs or feature requests.Please reach out to akansha.bansal@amd.com for any questions
This project is licensed under the MIT License - see the LICENSE file for details.