Simplified Google TPU RTL with systolic array and memory controller for convolution.
The following figure represents the design of a systolic array.
Mac.sv has the RTL of a single MAC unit.
MacArray.sv creates an array of MAC units that comprises the systolic array.
MemoryController.sv controls the data flow of weights, input feature maps, and output feature maps to perform GEMM (General Matrix Multiply)-based convolution operation.
Systolic.sv is a wrapper for the systolic array.
Current design only allows convolution of 16x16 images by 3x3 weights, and yields 190 MHz on Cyclone IV synthesis in Quartus