Skip to content

100-core tiled architecture chip capable of performing fixed-point multiply-and-accumulate operations in parallel

License

Notifications You must be signed in to change notification settings

henryz2004/ai-accelerator-chip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Accelerator Chip

This chip uses a tiled architecture to compute a series of fixed-point multiply-and-accumulate operations in parallel. In other words, the chip takes a vector and performs a series of matrix multiplications to it, which simulates layer computations in Deep Learning.

Architecture

The tiled architecture refers to the chip being composed of a grid of autonomous tiles, each tile connected to adjacent tiles. Each tile is analogous to an artificial neuron in Deep Learning. Information propagates from top to bottom, with the input vector being passed into the top layer of the grid and subsequent results being passed downwards to the next layer. The final result is the output from the bottom layer of the grid. The tiles communicate bidirectionally with its left/right neighbors to assemble the output of the entire layer, which is then propagated downwards as the input of the next neuron.

Core

Within each tile is an outer and inner core. The inner core computes the dot product of the input vector with its weight vector, which computes the output of one neuron. The inner core sends the result to the outer core, which communicates with the tile's neighbors to assemble the entire layer's output. The output of the outer core is the output of an entire layer.

Tile

The tile is a wrapper module for the outer core, which handles serial communication and daisy-chaining with neighboring tiles.

Communication

Communication happens in a daisy-chained manner, with each tile sending its own output as well as any outputs it receives from its neighbors. This allows the output of a single neuron to propogate to all of the other neurons in the layer, so that each tile can output the result of the layer to the next tile. This is necessary as each tile expects an entire vector (the output of the previous layer) as input.

Simulation

This project was built with Intel Quartus and simulated with ModelSim. Each module has appropriate testbenches to ensure the correctness of the components. Theouter_core_tb simulates a 4x4 grid of cores using non-serial communication. In the screenshot of the waveform simulation, the outputs of the last layer (/outer_tb/layer_outg[3]) are consistent with the expected results.

About

100-core tiled architecture chip capable of performing fixed-point multiply-and-accumulate operations in parallel

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages