-
Notifications
You must be signed in to change notification settings - Fork 51
Automatically Generated Functionality
This isn't PipelineC++ folks. One step at a time.
I want PipelineC code to compatible with regular C/C++ compilers for easy functional verification of your hardware. This makes custom C based "simulations" easy to develop. I mostly don't want to add new language features to standard C / change languages completely unless I have a parser and software compiler to support them. Would love some help in this area.
With that said, some concepts just beg to be implemented with template types and so really benefit from being implemented as auto generated C code.
- Top Level IO + Modules + Registers + Processes, etc
- u/intN_t Types
- Bit Manipulation
- Bit Math
- Fixed Size Arrays
- Casting to and from bytes
- Clock Crossings
- RAMs
- DSPs
- Operator Overloading
- Pragmas
- VHDL Escape Hatch
See this documentation.
- Any 'N' is supported, ex.
uint13_t
- Generated typedefs in header files
"uintN_t.h"
and"intN_t.h"
Unions can be complicated since they can rely on the layout of bytes in memory. There is no memory model. So for now there are auto generated bit manipulation functions for built in types. These functions are implemented as raw VHDL.
- Bit slice/select
- Integer-like variables can be used like a function call, where arguments define the bit slice range. Ex.:
uint32_t x; // x(31 downto 0) uint1_t y = x(15); // y = x[15] uint16_t z = x(15, 0); // z = x[15:0]
- The above syntax is sugar for directly calling fully named bit slicing functions:
-
uint<Y-X+1>_t <type_prefix>_Y_X(<type>)
- Ex.
uint16_t uintX_15_0(uintX_t data); // data[15:0], select bits 15 down to 0
- Ex.
-
- Integer-like variables can be used like a function call, where arguments define the bit slice range. Ex.:
- Bit concatenation
uintX_t <type_prefix_0>_<type_prefix_1>(<type0>, <type1>)
- Result size X is sum of input sizes
- Ex.
uint16_t uint14_uint4(uint14_t x, uint4_t y); // Upper 14 bits and lower 4 bits of a uint16_t
- Bit duplication
uint<Y>_t <type_prefix>_X(<type>)
- Result width is X times the input width
- Ex.
uint16_t uint4_4(uint4_t x); // Repeat a uint4_t 4 times to form uint16_t
- Rotate left/right
uint<size>_t rot[l|r]<size>_<amount>(uint<size>_t x)
- Ex.
uint64_t rotl64_7(uint64_t x); // Rotate the uint64_t value to the left by 7
- Bit assignment
base_t <base_type_prefix>_<assignment_type_prefix>_X(<base>, <assignment>)
- Assign data at bit position X in the base value. Result width is same as base.
- Ex.
uint64_t uint64_uint16_2(uint64_t in, uint16_t x); // in[17:2] = x
- Float SEM construction
- Only 32b float supported right now, can easily add other widths
float float_uint<exponent_bits>_uint<mantissa_bits>(sign, exponent, mantissa)
- Ex.
float float_uint8_uint23(uint1_t sign, uint8_t exponent, uint23_t mantissa);
- Float uint32_t construction
- Interpret uint32_t as a 32b float
float float_uint32(uint32_t data);
- Byte swap
- Swap the byte ordering of an unsigned value
uintN_t bswap_<bit_width>(input)
- Ex.
uint32_t bswap_32(uint32_t input);
- Array to unsigned
- Concatenate elements of an array to form a single value
- Element ordering (typically bytes) is either 'big endian' or 'little endian'
uintY_t uintX_arrayN_[be|le](uintX_t input[N])
- Result width is N times input width
- Ex.
uint64_t uint8_array8_be/le(uint8_t x[8]);
Little helper functions for common 'math' operations. These functions are implemented in PipelineC.
- Absolute value
uintN_t <type_prefix>_abs(intN_t input)
- Ex.
uint32_t int32_abs(int32_t input); // Absolute value removes sign bit
- N->1 Mux
- Binary tree of multiplexers selecting a single value from N values
<type> <type_prefix>_muxN(select, input0, input1, ... inputN);
- Ex.
uint8_t uint8_mux4(uint2_t select, uint8_t input0, uint8_t input1, uint8_t input2, uint8_t input3);
- Count zeros starting from upper/left bits
<count_type> count0s_<type_prefix>(type)
- Ex.
uint3_t count0s_uint7(uint7_t data); // Max zeros possible is 7
- N->1 Binary Operations
- Binary tree of binary operations
- Only AND, OR, SUM(add) supported right now
- Only
float
andu/intN_t
types supported. <output_type> <type_prefix>_<operation>N(input0, input1, ... inputN);
- Ex.
uint5_t uint3_sum3(uint3_t input0, uint3_t input1, uint3_t input2);
- uint3 max = 7, 7*3 = 21, stored in uint5_t
- Arrays are supported as well: Ex.
uint5_t uint3_array_sum3(uint3_t input[3]);
Any type can be generated into fixed size arrays of that type by including a specific header file. This is mostly to support returning fixed sized arrays from C functions. Ex.
typedef struct point
{
uint8_t x;
uint8_t y;
} point;
#include "point_array_N_t.h"
// Types like this are generated for you
typedef struct point_array_3_t
{
point data[3];
} point_array_3_t;
Any type can be converted to and from byte arrays by including a specific header file. This is mostly to support moving C structs from software C to PipelineC buffers. Both PipelineC and regular C code (i.e. using pointers instead of fixed size buffers) is generated. Ex.
typedef struct point ... ;
#include "point_bytes_t.h"
// A header like this is generated for you
#define point_bytes_t uint8_t_array_2_t // 2 bytes, auto gen fixed size array struct
#define point_size_t uint2_t // 0-2
point_bytes_t point_to_bytes(point x);
point bytes_to_point(point_bytes_t bytes);
// And similar functions using pointers for real C code
Clock crossing code is generated by including a specifically named header file. Ex.
message_t in_msg;
#include "clock_crossing/in_msg.h"
The READ and WRITE function signatures generated in that file depend on how and where the READ and WRITE functions are used. See clock domain crossing documentation here.
- Described as a state variable (prefer static locals) array with an element type and dimensions
- Ex.
elem_t the_ram[8][8][8][8]; // 4096 elements
- Four 3b indices/addresses is concatenated to form 12b address
- Ex.
- State variables are not used directly, that would infer registers and muxes instead.
// Declare an array static uint32_t my_ram[RAM_DEPTH]; // Using array variable directly in code is regs+muxes uint32_t rdata = my_ram[addr]; if(wr_enable) my_ram[addr] = wr_data; // Doing this _RAM_SP_RF_0 instead is same thing // but will let the synthesis tool infer a proper RAM uint32_t rdata = my_ram_RAM_SP_RF_0(addr, wr_data, wr_enable);
- Access to the RAM takes the form of stateful functions acting on that storage data
-
elem_t <var_name>_<RAM_type>(address0,...,addressN, write_data, write_enable)
-
Input arguments are read/write data+flags and output is read data
-
RAM types:
-
SP_RF_<latency>
Single port, read first -
DP_RF_<latency>
Dual port, read first- One port is write-only, the other port is read-only
- Other dual port styles are currently not built in yet. As a work around raw VHDL is used in ram.h.
-
0 clock latency
RAMs are implemented as LUTRAMs for same cycle access -
1,2 clock latency
RAMs are implemented as either block RAMs or LUTRAMs (synthesis tool decides).
-
-
Ex.
elem_t the_ram_RAM_SP_RF_2(uint3_t addr0, uint3_t addr1, uint3_t addr2, uint3_t addr3, elem_t write_data, uint1_t write_enable);
- Reads and write return values are valid/completed/returned on the
_2
second iteration/call of the function.
- Reads and write return values are valid/completed/returned on the
-
Single Port, example 1 (notice 1 clock write and read latency)
#include "uintN_t.h" #pragma MAIN my_func uint32_t my_func() { static uint32_t my_bram[128]; static uint32_t waddr = 0; static uint32_t wdata = 0; static uint32_t raddr = 0; uint32_t rdata = my_bram_RAM_DP_RF_1(raddr, waddr, wdata, 1); printf("Write: addr=%d,data=%d. Read addr=%d. Read data=%d\n", waddr, wdata, raddr, rdata); // Test pattern if(wdata > 0){ raddr += 1; } waddr += 1; wdata += 1; return rdata; // Dummy }
- Simulating with cocotb and ghdl
--sim --comb --cocotb --ghdl
Clock: 0 Write: addr=0,data=0. Read addr=0. Read data=0 Clock: 1 Write: addr=1,data=1. Read addr=0. Read data=0 Clock: 2 Write: addr=2,data=2. Read addr=1. Read data=0 Clock: 3 Write: addr=3,data=3. Read addr=2. Read data=1 Clock: 4 Write: addr=4,data=4. Read addr=3. Read data=2 Clock: 5 Write: addr=5,data=5. Read addr=4. Read data=3 ...
- Simulating with cocotb and ghdl
-
Single Port, example 2, using a multidimensional array with power of 2 sizes
One 36 Kb primitive to fit the elem_t=uint8_t 4Kbs per the above example Report Cell Usage: +------+-----------+------+ | |Cell |Count | +------+-----------+------+ |1 |BUFG | 1| |2 |RAMB36E1_1 | 1| |3 |FDRE | 29| |4 |IBUF | 22| |5 |OBUF | 8| +------+-----------+------+
#include "uintN_t.h" #define elem_t uint8_t #pragma MAIN_MHZ main 100.0 elem_t main(uint3_t addr0, uint3_t addr1, uint3_t addr2, uint3_t addr3, elem_t write_data, uint1_t write_enable) { static elem_t the_ram[8][8][8][8]; // 4096 elements return the_ram_RAM_SP_RF_2(addr0, addr1, addr2, addr3, write_data, write_enable); }
-
Dual Port
#include "uintN_t.h" #define elem_t uint8_t #pragma MAIN_MHZ main 100.0 elem_t main( uint3_t addr_r0, uint3_t addr_r1, uint3_t addr_r2, uint3_t addr_r3, // Read port uint3_t addr_w0, uint3_t addr_w1, uint3_t addr_w2, uint3_t addr_w3, elem_t write_data, uint1_t write_enable) // Write port { static elem_t the_ram[8][8][8][8]; // 4096 elements return the_ram_RAM_DP_RF_2(addr_r0, addr_r1, addr_r2, addr_r3, addr_w0, addr_w1, addr_w2, addr_w3, write_data, write_enable); }
-
By default the PipelineC tool will infer pipelined multipliers from the *
operator. In VHDL the multiply operator is surrounded by input and output registers as needed, allowing the synthesis tool to infer as much DSP primitive pipelining as possible. This can be changed for the entire design using the --mult
command line option. See -h
for more information.
To disable use of inferred multipliers and use FPGA fabric implementations only use the FUNC_MULT_STYLE
pragma to set fabric
multiplier style. Ex.
#pragma FUNC_MULT_STYLE mult fabric // default: infer
uint32_t mult(uint16_t x, uint16_t y)
{
return x * y;
}
See operator name constants near the top of C_TO_LOGIC.py. Make sure to use argument names like left
, right
, and expr
. Finally, multiplies are distinguished by if they use inferred hard DSP blocks or are implemented in FPGA fabric, see below examples.
// An example user type
typedef struct complex
{
float re;
float im;
}complex;
// Override the '+' operator
complex BIN_OP_PLUS_complex_complex(complex left, complex right)
{
complex rv;
rv.re = left.re + right.re;
rv.im = left.im + right.im;
return rv;
}
complex BIN_OP_PLUS_complex_float(complex left, float right)
{
complex rv;
rv.re = left.re + right;
rv.im = left.im + right;
return rv;
}
// Override the '*' operator
// Implementation to use when inferred DSPs for multiplies are used
complex BIN_OP_INFERRED_MULT_complex_complex(complex left, complex right)
{
complex rv;
rv.re = left.re * right.re;
rv.im = left.im * right.im;
return rv;
}
// Implementation to use when multipliers are implemented in logic/FPGA fabric (same here)
complex BIN_OP_MULT_complex_complex(complex left, complex right)
{
complex rv;
rv.re = left.re * right.re;
rv.im = left.im * right.im;
return rv;
}
// Override the '!' operator
complex UNARY_OP_NOT_complex(complex expr)
{
complex rv;
rv.re = expr.re * -1.0;
rv.im = expr.im * -1.0;
return rv;
}
See the pragmas page.
Write raw HDL if you must.