摘要:
The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit. Thus, the three main tasks of the load store unit are: (1) handling out of order cache requests; (2) detecting address collisions; and (3) alignment of data.
摘要:
The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit. Thus, the three main tasks of the load store unit are: (1) handling out of order cache requests; (2) detecting address collisions; and (3) alignment of data.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
摘要:
The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available. Unified scheduling is performed across multiple execution data paths, where each execution data path, and corresponding functional units, is generally optimized for the type of computational function that is to be performed on the data: integer, floating point, and boolean. The number, type and computational specifics of the functional units provided in each data path, and as between data paths, are mutually independent.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution and concurrent results distribution for enhanced resource utilization and performance throughput. The computer system architecture includes an instruction fetch unit for fetching program instruction sets. Each instruction set includes a plurality of fixed length instructions with a prescribed program order (in-order). The architecture also includes an instruction execution unit for dynamically examining the instruction sets and scheduling instructions for execution, including out-of-order execution, among a plurality of functional units. The data results of the executed instructions are concurrently distributed to a temporary buffer and a register file array and managed by associated control logic, including a register renaming unit, a dependency checker unit, done control unit, and retirement control unit. The architecture also optimizes the scheduling of data paths in accordance with the type of computational function, including integer, floating point, and boolean.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
摘要:
A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.
摘要:
The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available. Unified scheduling is performed across mutliple execution data paths, where each execution data path, and corresponding functional units, is generally optimized for the type of computational function that is to be performed on the data: integer, floating point, and boolean. The number, type and computational specifics of the functional units provided in each data path, and as between data paths, are mutually independent.