摘要:
Two processor controls for supporting efficient Firm Consistency while allowing out-of-order execution of Load instructions is provided. The Touch control operates when the processor stores a subsequent Store in a pending Store buffer while awaiting any outstanding Loads or Stores. The efficiency of the pending Store is improved by issuing a Touch of the data which pre-loads the line of data in the cache that is the subject of the store. The processor can complete out-of-order execution of a subsequently issued Load relative to a prior Load, but only to its finished state. The subsequently issued Load is not allowed to complete until the prior Load is completed. The Finished Load Cancellation control ensures that Firm Consistency is maintained by canceling any finished Loads, and subsequent instructions, when the subject of the Load is the same as an invalidation request from a multiprocessor.
摘要:
A computing system uses specialized “Set Associative Transaction Tables” and additional “Summary Transaction Tables” to speed the processing of common transactional memory conflict cases and those which employ assist threads using an Address History Table and processes memory transactions with a Transaction Table in memory for parallel processing of multiple threads of execution by support of which an application need not be aware. Special instructions may mark the boundaries of a transaction and identify memory locations applicable to a transaction. A ‘private to transaction’ (PTRAN) tag, directly addressable as part of the main data storage memory location, enables a quick detection of potential conflicts with other transactions that are concurrently executing on another thread of said computing system. The tag indicates whether (or not) a data entry in memory is part of a speculative memory state of an uncommitted transaction that is currently active in the system.
摘要:
A method, system, and computer program product for enhancing performance of an in-order microprocessor with long stalls. In particular, the mechanism of the present invention provides a data structure for storing data within the processor. The mechanism of the present invention comprises a data structure including information used by the processor. The data structure includes a group of bits to keep track of which instructions preceded a rejected instruction and therefore will be allowed to complete and which instructions follow the rejected instruction. The group of bits comprises a bit indicating whether a reject was a fast or slow reject; and a bit for each cycle that represents a state of an instruction passing through a pipeline. The processor speculatively continues to execute a set bit's corresponding instruction during stalled periods in order to generate addresses that will be needed when the stall period ends and normal dispatch resumes.
摘要:
A method of prefetching data in a microprocessor includes identifying a data stream associated with a process and determining a depth associated with the data stream based upon prefetch factors including the number of currently concurrent data streams and data consumption rates associated with the concurrent data streams. Data prefetch requests are allocated with the data stream to reflect the determined depth of the data stream. Allocating data prefetch requests may include allocating prefetch requests for a number of cache lines away from the cache line currently being referenced, wherein the number of cache lines is equal to the determined depth. The method may include, responsive to determining the depth associated with a data stream, configuring prefetch hardware to reflect the determined depth for the identified data stream. Prefetch control bits in an instruction executed by the processor control the prefetch hardware configuration.
摘要:
An apparatus and method for self-initiated instruction issuing are implemented. In a central processing unit (CPU) having a pipelined architecture, instructions are queued for issuing to the execution unit which will execute them. Instructions are issued each cycle, and an instruction should be selectable for issuing as soon as its source operands are available. An instruction in the issue queue having source operands depending on other, target, instructions to determine their value are signaled to the target instruction by a link mask in the queue entry corresponding to the target instruction. A bit in the link mask identifies the queue entry corresponding to the dependent instruction. When the target instruction issues to the execution unit, a bit is set in a predetermined portion of the queue entry containing the dependent instruction. The portion of the queue entry is associated with the source operand depending on the issuing instruction. This bit informs selection logic circuitry that the dependency is resolved by the issuing instruction, and the dependent instruction may be selected for issuing.
摘要:
A method and system for utilizing a completion table in a superscalar processor is disclosed. The method and system comprises providing a plurality of threads to the processor and associating a link list with each of the threads, wherein each entry associated with a thread is linked to a next entry. A method and system in accordance with the present invention implements the completion table as link lists. Each entry in the completion table in a thread is linked to the next entry via a pointer that is stored in a link list. In a second aspect a method of determining the relative order between instructions is provided. A method and system in accordance with the present invention implements a flush mask array which is accessed to determine the relative order of entries in the said completion table. A method and system in accordance with the present invention implements a restore head pointer table to save and restore the state of the pointer of said completion table.
摘要:
A method and apparatus for patching a problematic instruction within a pipelined processor in a data processing system is presented. A plurality of instructions are fetched, and the plurality of instructions are matched against at least one match condition to generate a matched instruction. The match conditions may include matching the opcode of an instruction, the pre-decode bits of an instruction, a type of instruction, or other conditions. A matched instruction may be marked using a match bit that accompanies the instruction through the instruction pipeline. The matched instruction is then replaced with an internal opcode or internal instruction that causes the instruction scheduling unit to take a special software interrupt. The problematic instruction is then patched through the execution of a set of instructions that cause the desired logical operation of the problematic instruction.
摘要:
An ISYNC instruction does not cause a flush of speculatively dispatched or fetched instructions (instructions that are dispatched or fetched after the ISYNC instruction) unconditionally. The present invention detects the occurrence of any instruction that changes the state of the machine and requires a context synchronizing complete; these instructions are called context-synchronizing-required instructions. When a context-synchronizing-required instruction completes, the present invention sets a flag to note the occurrence of that condition. When an ISYNC instruction completes, the present invention causes a flush and refetches the instruction after the ISYNC if the context-synchronizing-required flag is active. The present invention then resets the context-synchronizing-required flag. If the context-synchronizing-required flag is not active, then the present invention does not generate a flush operation.
摘要:
An XER scoreboard function is provided by utilizing the instruction sequencer unit scoreboard. A scoreboard bit is set if the XER is being used by a previous instruction. If a new instruction is fetched that uses the XER, a dummy read to the XER is generated to test the scoreboard bit to determine if the scoreboard bit is set. If the scoreboard bit is not set when the dummy read is executed, the X-form string proceeds to execution. If the scoreboard bit is set when the dummy is executed, the pipeline is stalled until the scoreboard bit is cleared, and then the X-form string padded with generated padding IOPs (Dummy or NOPs) is executed. After an accessing instruction is executed, the scoreboard bit is cleared.
摘要:
A method and a system in a data processing system for managing registers in a register array wherein the data processing system has M architected registers and the register array has greater than M registers. A first physical register address is selected from a group of available physical register addresses in a renamed table in response to dispatching a register-modifying instruction that specifies an architected target register address. The architected target register address is then associated with the first physical register address, and a result of executing the register-modifying instruction is stored in a physical register pointed to by the first physical register address. In response to completing the register-modifying instruction, the first physical address in the rename table is exchanged with a second physical address in a completion renamed table, wherein the second physical address is located in the completion rename table at a location pointed to by the architected target register address. Therefore, upon instruction completion, the completion rename table contains pointers that map architected register addresses to physical register addresses. The rename table maps architected register addresses to physical register addresses for instructions currently being executed, or for instructions that have “finished” and have not yet been “completed.” Bits indicating the validity of an association between an architected register address and a physical register address are stored before instructions are speculatively executed following an unresolved conditional branch.