摘要:
Commercial data processors are available that include a capability of extending their instruction set for a specified application, i.e. of introducing customized functional units in the interest of enhanced processing performance. For such processors there is a need for automatically forming the extensions from high-level application code. A technique is described for selecting maximal-speedup convex subgraphs of the application dataflow graph under micro-architectural constraints.
摘要:
Customisable embedded processors that are available on the market make it possible for designers to speed up execution of applications by using Application-specific Functional Units (AFUs), implementing Instruction-Set Extensions (ISEs). Furthermore, techniques for automatic ISE identification have been improving; many algorithms have been proposed for choosing, given the application's source code, the best ISEs under various constraints. Read and write ports between the AFUs and the processor register file are an expensive asset, fixed in the micro-architecture—some processors indeed only allow two read ports and one write port—and yet, on the other hand, a large availability of inputs and outputs to and from the AFUs exposes high speedup. Here we present a solution to the limitation of actual register file ports by serialising register file access and therefore addressing multi-cycle read and write. It does so in an innovative way for two reasons: (1) it exploits and brings forward the progress in ISE identification under constraint, and (2) it combines register file access serialisation with pipelining in order to obtain the best global solution. Our method consists of scheduling graphs—corresponding to ISEs—under input/output constraint
摘要:
A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
摘要:
Access to a memory is optimized by monitoring physical memory addresses and by detecting a memory access conflict based on the monitored physical memory addresses. The data stored at a physical address for which a conflict was detected is transferred to a new physical address.
摘要:
A memory system comprises a first memory having associated therewith a first local memory access controller configured to access the first local memory using physical memory addresses and a second memory having associated therewith a second local memory access controller configured to access the second local memory using physical memory addresses. A global controller coupled to the first and second local controllers is configured to communicate virtual memory addresses to the first and second local memory controllers.
摘要:
New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.
摘要:
A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
摘要:
New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.
摘要:
A memory system comprises a first memory having associated therewith a first local memory access controller configured to access the first local memory using physical memory addresses and a second memory having associated therewith a second local memory access controller configured to access the second local memory using physical memory addresses. A global controller coupled to the first and second local controllers is configured to communicate virtual memory addresses to the first and second local memory controllers.