Coprocessors and Processor Arrays
With the continuous growth of VLSI technology, it becomes feasible
to build large arrays of simple processing elements and functional units
in a single chip (e.g. FPGA) to process recurrent algorithms. The
approach, however, may be restricted by the limited number of
input/output pins that interface the chip and the external memory
and peripherals.
Our focus in this area is on the automated synthesis of VLSI arrays
of processing elements and functional units, and the automated mapping
of recurrent algorithms (e.g. loops) on these VLSI arrays.
To cope with the problem of limited I/O, we have studied the use of
on-chip memory and mapping algorithms to maximize the reuse of on-chip
data. The relevant issues studied include
the amount of chip area allocated to on-chip memory, the interconnection
topology of processing elements and functional units, their
reconfigurability to adapt to applications of different behavior,
and the design of compilers to map recurrent algorithms on the system.
-
Coprocessor system.
We have studied the design of a super vector coprocessor
for executing nested DO loops. The coprocessor consists of
a collection of processing elements or functional units, whose
topology can be reconfigured dynamically before a recurrent
algorithm is evaluated. Research issues studied include the
allocation of chip area for on-chip memory, architectural
support for dynamic reconfiguration, and the design of a
compiler to generate code that minimizes completion time and
the access of off-chip data. The idea has been applied to
design next-generation signal processors.
-
Processor arrays: optimal synthesis.
This early work is on the synthesis of application-specific affine
(uniform) recurrences on regular fine-grain processor arrays.
An affine dependence algorithm consists of a set of indexed
computations and a set of uniform dependence vectors that are
independent of the indices of computations.
We have developed a polynomial-time search algorithm
for mapping high-dimensional recurrences on lower-dimensional
processor arrays. Originally, the problem was formulated by
others as nonlinear integer programming problems with
exponential complexity. The new approach allows optimal
trade-offs between completion time and hardware complexity.
-
Processor arrays: applications.
The polynomial-time method for optimal synthesis of processor
arrays has been applied in designing VLSI array processors
for solving dynamic programming problems, image processing,
signal processing, and matrix operations.
-
Processor arrays: interfacing.
This addresses related issues in interfacing VLSI array
processors to host computers.
-
Other work in architecture.
This addresses the design of associative memory and
interleaved memories.