Coprocessors and Processor Arrays

With the continuous growth of VLSI technology, it becomes feasible to build large arrays of simple processing elements and functional units in a single chip (e.g. FPGA) to process recurrent algorithms. The approach, however, may be restricted by the limited number of input/output pins that interface the chip and the external memory and peripherals.

Our focus in this area is on the automated synthesis of VLSI arrays of processing elements and functional units, and the automated mapping of recurrent algorithms (e.g. loops) on these VLSI arrays. To cope with the problem of limited I/O, we have studied the use of on-chip memory and mapping algorithms to maximize the reuse of on-chip data. The relevant issues studied include the amount of chip area allocated to on-chip memory, the interconnection topology of processing elements and functional units, their reconfigurability to adapt to applications of different behavior, and the design of compilers to map recurrent algorithms on the system.

Coprocessor system. We have studied the design of a super vector coprocessor for executing nested DO loops. The coprocessor consists of a collection of processing elements or functional units, whose topology can be reconfigured dynamically before a recurrent algorithm is evaluated. Research issues studied include the allocation of chip area for on-chip memory, architectural support for dynamic reconfiguration, and the design of a compiler to generate code that minimizes completion time and the access of off-chip data. The idea has been applied to design next-generation signal processors.
Processor arrays: optimal synthesis. This early work is on the synthesis of application-specific affine (uniform) recurrences on regular fine-grain processor arrays. An affine dependence algorithm consists of a set of indexed computations and a set of uniform dependence vectors that are independent of the indices of computations. We have developed a polynomial-time search algorithm for mapping high-dimensional recurrences on lower-dimensional processor arrays. Originally, the problem was formulated by others as nonlinear integer programming problems with exponential complexity. The new approach allows optimal trade-offs between completion time and hardware complexity.
Processor arrays: applications. The polynomial-time method for optimal synthesis of processor arrays has been applied in designing VLSI array processors for solving dynamic programming problems, image processing, signal processing, and matrix operations.
Processor arrays: interfacing. This addresses related issues in interfacing VLSI array processors to host computers.
Other work in architecture. This addresses the design of associative memory and interleaved memories.