
The following lists the key features of the architecture:
• Matrix A and B storage are double buffered to allow processing to happen in parallel with data loading.
• Where the number of columns of A (A_COLUMNS) and rows of B (same as A_COLUMNS) are greater than
the size of the dot product (VECTOR_SIZE), the rows of A and columns of B are divided into sub rows
and sub columns respectively, each containing VECTOR_SIZE elements. In this case, A_COLUMNS/
VECTOR_SIZE iterations are needed to compute a full dot product corresponding to a single output
element.
• Matrix B memory has sufficient bandwidth so that all the data needed for the dot product can be
loaded at once.
• Matrix A memory is allocated with less bandwidth. The bandwidth of the matrix A is a parameter
(NUM_BLOCKS) that you can control. A sub row of matrix A is loaded into local registers over a number
of cycles before an iteration of the dot product. Once a sub row of Matrix A has been loaded into local
registers, all partial dot products involving that sub row are computed before another sub row is
loaded.
• For Arria 10 devices, where hardened single precision floating-point DSP blocks exist, those will be
used for single precision floating point arithmetic.
The matrix multiply architecture is not optimized for sparse matrices and constant matrices.
ALTERA_FP_MATRIX_MULT Signals
Figure 3-4: ALTERA_FP_MATRIX_MULT Signals
This figure shows the signals for the ALTERA_FP_MATRIX_MULT IP core.
clk
reset_n
a_ready
a_valid
a_data
inst
ALTERA_FP_MATRIX_MULT
c_valid
c_ready
c_data
b_ready
b_valid
b_data
These tables list the signals for the ALTERA_FP_MATRIX_MULT IP core.
Table 3-2: ALTERA_FP_MATRIX_MULT Input Signals
Port Name Required Description
clk Yes The clock input port for the IP core.
3-4
ALTERA_FP_MATRIX_MULT Signals
UG-01058
2014.12.19
Altera Corporation
ALTERA_FP_MATRIX_MULT IP Core
Send Feedback
Comentários a estes Manuais