is a read-only bus; the other is a read-write bus. The FB has to send enough data
per cycle to a whole row or column of the RC Array. Since there are eight RCs, each
needing two 8-bit operands, a total of 128 bits (8 RCs * 2 operands/RC * 8 bits/operand =
128 bits) is necessary, hence the two 64-bit read buses. One 64-bit bus is needed to
write data back to the FB from the RC Array because each RC produces an 8-bit output (8
RCs * 1 output/RC * 8 bits/output = 64 bits). Since one of the data buses between
the RC Array and the FB is used for both reading and writing, a read and write between
these two modules may not occur at the same time. However, the DMA Controller and RC Array
buses are independent of each other and may each read or write at will, with the
constraint presented below.
The FB is divided into two separate sets (memory buffers). This configuration allows
the DMA Controller to access one set while the RC Array is accessing the other. Each
set may be accessed by either the DMA Controller or by the RC Array, but not by both
simultaneously. Each set is further divided into two banks (Figure 3), each 64 bits
wide. The DMA Controller accesses one bank at a time, while the RC Array accesses
both banks within the same set at the same time. Thus, the DMA Controller must
deliver data to the FB at a rate at least twice as fast as the rate the RC Array reads it.
Fortunately, this stipulation does not degrade the performance of the RC Array
because many typical applications require the RC Array to perform several operations on
the same set of data before the desired result is obtained (the DMA can fill a set of the
FB with data before the RC Array needs another set of data).
FB Internal Block Diagram
Direct Memory Access Controller:
The DMA Controller acts as the interface between
the main memory of the processor and the FB and RC Array modules. The data bus between the DMA Controller and the FB
is a 64-bit read-write bus, while the data bus between the DMA Controller and the RC Array
is 32 bits wide (Figure 2). Since the data bus to and from memory
is 32 bits wide, the DMA Controller needs two cycles to assemble 64 bits
of data from the memory for the
FB, and one cycle to assemble the 32-bit data for the RC Array.
The DMA Controller has three main components: the Data Packing Register,
the Address Generator Unit, and the State Controller. The Data Packing Register assembles the 64
bits of data for the FB. The Address Generator Unit generates and tracks addresses for the
memory, FB, and RC Array. The State Controller receives information from the Tiny
RISC processor and determines the following sequence of data transfers to and from the FB
and RC Array. The amount of data transferred is specified by the
information from Tiny RISC.
RC Array - 8x8 array of RCs
Reconfigurable Cell Array:
The RC Array is a Single-Instruction Multiple-Data (SIMD) multiprocessor.
It consists of an 8x8 array (Figure 4) of processing units (called RCs). The
array is row or column reconfigurable, meaning that a whole row or column can be
reconfigured at the same time, with the same context across all eight cells. With
the same context, each row or column executes the same instruction on different data,
hence making each row or column a SIMD multiprocessor. Each RC stores a copy of its
current context in its Context Register, which is internal to each RC and separate from
the Context Memory. Capability to reconfigure single RCs is present. The power of
the RC Array lies in the fact that, depending on a specific application's needs, it can be
configured to be a row or a column of eight multi