摘要:
A data processing system including a processor having a load/store unit and a method for correcting effective address aliasing. In the load/store unit within the processor, load and store instructions are executed out of order. The load and store instructions are assigned tags in a predetermined manner, and then assigned to load and store reorder queues for keeping track of the program order of the load and store instructions. A real address tag is utilized to correct for effective address aliasing within the load/store unit.
摘要:
A method and system for atomic memory accesses in a processor system, wherein the processor system is able to issue and execute multiple instructions out of order with respect to a particular program order. A first reservation instruction is speculatively issued to an execution unit of the processor system. Upon issuance, instructions queued for the execution unit which occur after the first reservation instruction in the program order are flushed from the execution unit, in response to detecting any previously executed reservation instructions in the execution unit which occur after the first reservation instruction in the program order. The first reservation instruction is speculatively executed by placing a reservation for a particular data address of the first reservation instruction, in response to completion of instructions queued for the execution unit which occur prior to the first reservation instruction in the program order, such that reservation instructions which are speculatively issued and executed in any order are executed in-order with respect to a partnering conditional store instruction.
摘要:
A method and system for permitting nonsequential instruction dispatch in a superscalar processor system which dispatches sequentially ordered multiple instructions simultaneously to a group of execution units on an opportunistic basis for execution and placement of results thereof within specified general purpose registers. Each instruction generally includes at least one source operand and one destination operand. A plurality of intermediate storage buffers are provided and each time an instruction is dispatched to an available execution unit, a particular one of the intermediate storage buffers is assigned to any destination operand within the dispatched instruction, permitting the results of the execution of each instruction to be stored within an intermediate storage buffer. An indication of the status of each instruction is maintained within a completion buffer and thereafter utilized to selectively transfer results within the intermediate storage buffers to selected general purpose registers in an order consistent with an application specified sequential order. The occurrence of an interrupt which prohibits completion of a selected instruction can therefore be accurately identified within the completion buffer.
摘要:
A method and system for increased instruction dispatch efficiency in a superscalar processor system having an instruction queue for receiving a group of instructions in an application specified sequential order and an instruction dispatch unit for dispatching instructions from an associated instruction buffer to multiple execution units on an opportunistic basis. The dispatch status of instructions within the associated instruction buffer is periodically determined and, in response to a dispatch of the instructions at the beginning of the instruction buffer, the remaining instructions are shifted within the instruction buffer in the application specified sequential order and a partial group of instructions are loaded into the instruction buffer from the instruction queue utilizing a selectively controlled multiplex circuit. In this manner additional instructions may be dispatched to available execution units without requiring a previous group of instructions to be dispatched completely.
摘要:
A method and system for input/output control in a multiprocessor system having multiprocessors coupled to a system memory via a common wide bus. The common wide bus is subdivided into multiple sub-buses which may be accessed individually or in groups by a selected processor, or individual sub-buses may be accessed by multiple processors simultaneously in response to one or more transfer requests. In response to a transfer request having a data address associated therewith, a particular target device is identified. The data address is then written into an address queue. Thereafter, one or more of the multiple sub-buses are utilized to transfer data to or from a single processor in response to a transfer request from a single processor. In response to a transfer request from multiple processors, one or more of the multiple sub-buses may be utilized separately to simultaneously transfer data to or from multiple processors.
摘要:
A method and system for increased instruction synchronization efficiency in a superscalar processor system which includes instructions having multiple source and destination operands. Simultaneous dispatching of multiple instructions creates a source-to-destination data dependency problem in that the results of one instruction may be necessary to accomplish execution of a second instruction. Data dependency hazards may be eliminated by prohibiting each instruction from dispatching until all possible data dependencies have been eliminated by the completion of preceding instructions; however, instruction dispatch efficiency is substantially decreased utilizing this technique. Data dependency interlock circuitry may be utilized to clear possible data dependency hazards; however, the complexity of such circuitry increases dramatically as the number of interlocked sources and destinations increases. The method and system of the present invention utilizes data dependency interlock circuitry capable of interlocking two source operands by two destination operands for each instruction. Instructions having three or more source operands are interlocked at the dispatch stage for the first two source operands utilizing existing data dependency interlock circuitry. Thereafter, the instruction is dispatched only after data dependency hazards are cleared for the first two source operands, utilizing the data dependency interlock circuitry, and all instructions preceding the instruction have been completed, eliminating possible data dependency hazards for the third source operand. In this manner, instructions which include three source operands may be synchronized without requiring a substantial increase in data dependency interlock circuitry and with only a slight degradation in system efficiency.
摘要:
Disclosed is an apparatus which deactivates both the AC as well as the DC component of power for various functions in a CPU. The CPU partitions dataflow registers and arithmetic units such that voltage can be removed from the upper portion of dataflow registers when the software is not utilizing same. Clock signals are also prevented from being applied to these non-utilized components. As an example, if a 64 bit CPU (processor unit) is to be used with both 32 and 64 bit software, the mentioned components may be partitioned in equal sized upper and lower portions. The logic signal for activating the removal of voltage may be obtained from a software-accessible architected control register designated as a machine state register in some CPUs. The same logic may be used in connection with removing voltage and clocks from other specialized functional components such as the floating point unit when software instructions do not presently require same.
摘要:
A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network. A system and method for creating a dedicated pipeline for processing streaming data also are provided.
摘要:
A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.
摘要:
A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.