摘要:
A pipelined or superscalar processor (10) that executes operations utilizing operand data of variable bit widths improves parallel performance by partitioning a fixed bit width operand (200) into several partial operand fields (215, 216 and 217), and checking for data dependencies, tagging and forwarding data in these fields independently of one another. An instruction decoder (18) concurrently dispatches multiple ROPs to various functional units (20, 21, 22 and 80). Conflicts which arise with respect to register resources are resolved through register renaming. However, implementation of register renaming is difficult when register structures are overlapping. The present invention supports independent dependency checking, tagging and forwarding of partial bit fields of a register operand which, in combination, allow renaming of registers. Therefore, the variable width register operand structure greatly assists the processor to resolve data dependencies. Operands are tagged by a reorder buffer (26) and supplied with data when it becomes available without regard for the type of data. This method of dependency resolution supports parallel performance of operations and provides a substantial improvement in overall speed of processing. Thus, the processor promotes parallel processing of operations that act upon overlapping data structures which otherwise resist parallel handling.
摘要:
A pipelined or superscalar processor (10) that executes operations utilizing operand data of variable bit widths improves parallel performance by partitioning a fixed bit width operand (200) into several partial operand fields (215, 216 and 217), and checking for data dependencies, tagging and forwarding data in these fields independently of one another. An instruction decoder (18) concurrently dispatches multiple ROPs to various functional units (20, 21, 22 and 80). Conflicts which arise with respect to register resources are resolved through register renaming. However, implementation of register renaming is difficult when register structures are overlapping. The present invention supports independent dependency checking, tagging and forwarding of partial bit fields of a register operand which, in combination, allow renaming of registers. Therefore, the variable width register operand structure greatly assists the processor to resolve data dependencies. Operands are tagged by a reorder buffer (26) and supplied with data when it becomes available without regard for the type of data. This method of dependency resolution supports parallel performance of operations and provides a substantial improvement in overall speed of processing. Thus, the processor promotes parallel processing of operations that act upon overlapping data structures which otherwise resist parallel handling.
摘要:
Traditionally, providing parallel processing within a multi-core system has been very difficult. Here, however, a system in provided where serial source code is automatically converted into parallel source code, and a processing cluster is reconfigured “on the fly” to accommodate the parallelized code based on an allocation of memory and compute resources. Thus, the processing cluster and its corresponding system programming tool provide a system that can perform parallel processing from a serial program that is transparent to a user.
摘要:
Traditionally, providing parallel processing within a multi-core system has been very difficult. Here, however, a system is provided where serial source code is automatically converted into parallel source code, and a processing cluster is reconfigured “on the fly” to accommodate the parallelized code based on an allocation of memory and compute resources. Thus, the processing cluster and its corresponding system programming tool provide a system that can perform parallel processing from a serial program that is transparent to a user. Generally, a control node connected to the address and data leads of a host processor uses messages to control the processing of data in a processing cluster. The cluster includes nodes of parallel processors, shared function memory, a global load/store, and hardware accelerators all connected to the control node by message busses. A crossbar data interconnect routes data to the cluster circuits separate from the message busses.
摘要:
A memory system for operation with a processor, such as a digital signal processor, includes a high speed pipelined memory, a store buffer for holding store access requests from the processor, a load buffer for holding load access requests from the processor, and a memory control unit for processing access requests from the processor, from the store buffer and from the load buffer. The memory control unit may include prioritization logic for selecting access requests in accordance with a priority scheme and bank conflict logic for detecting and handling conflicts between access requests. The pipelined memory may be configured to output two load results per clock cycle at very high speed.
摘要:
Disclosed herein is a method for matching dependency coordinates and an efficient apparatus for performing the dependency coordinate matching very quickly. A plurality of buffers to store instructions is set forth. Each storage location of a buffer corresponds to a particular pair of dependency coordinates. Dependency matching logic receives the dependency coordinates for a buffered instruction and scheduling information pertaining to dispatched instructions. The dependency matching logic indicates whether a dependency precludes scheduling of the corresponding buffered instruction. Dependency checking logic produces a ready signal for the buffered instruction when no such dependency is indicated by the dependency matching logic.