摘要:
Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
摘要:
Disclosed are a method and a system for grouping processor instructions for execution by a processor, where the group of processor instructions includes at least two branch processor instructions. In one or more embodiments, an instruction buffer can decouple an instruction fetch operation from an instruction decode operation by storing fetched processor instructions in the instruction buffer until the fetched processor instructions are ready to be decoded. Group formation can involve removing processor instructions from the instruction buffer and routing the processor instruction to latches that convey the processor instructions to decoders. Processor instructions that are removed from instruction buffer in a single clock cycle can be called a group of processor instructions. In one or more embodiments, the first instruction in the group must be the oldest instruction in the instruction buffer and instructions must be removed from the instruction buffer ordered from oldest to youngest.
摘要:
Mechanisms, in a data processing system, are provided for tracking effective addresses through a processor pipeline of the data processing system. The mechanisms comprise logic for fetching an instruction from an instruction cache and associating, by an effective address table logic in the data processing system, an entry in an effective address table (EAT) data structure with the fetched instruction. The mechanisms further comprise logic for associating an effective address tag (eatag) with the fetched instruction, the eatag comprising a base eatag that points to the entry in the EAT and an eatag offset. Moreover, the mechanisms comprise logic for processing the instruction through the processor pipeline by processing the eatag.
摘要:
A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path.
摘要:
Mechanisms, in a data processing system, are provided for tracking effective addresses through a processor pipeline of the data processing system. The mechanisms comprise logic for fetching an instruction from an instruction cache and associating, by an effective address table logic in the data processing system, an entry in an effective address table (EAT) data structure with the fetched instruction. The mechanisms further comprise logic for associating an effective address tag (eatag) with the fetched instruction, the eatag comprising a base eatag that points to the entry in the EAT and an eatag offset. Moreover, the mechanisms comprise logic for processing the instruction through the processor pipeline by processing the eatag.
摘要:
A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path.
摘要:
Mechanisms, in a processor, are provided for detecting and handling short forward branch conversion candidates. The mechanisms identify a conditional branch in the computer code and determine if the short forward conditional branch is to be converted to a non-branching conditional sequence of instructions. Moreover, the mechanisms convert the conditional branch to a non-branching conditional sequence of instructions comprising a resolve instruction and one or more conditional instructions dependent on the resolve instruction. In addition, the mechanisms execute the non-branching conditional sequence of instructions in place of the conditional branch in the computer code and generate an output of the computer code based on the execution of the non-branching conditional sequence of instructions.
摘要:
A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.
摘要:
A method and data processing system for managing running of instructions in a program. A processor of the data processing system receives a monitoring instruction of a monitoring unit. The processor determines if at least one secondary thread of a set of secondary threads is available for use as an assist thread. The processor selects the at least one secondary thread from the set of secondary threads to become the assist thread in response to a determination that the at least one secondary thread of the set of secondary threads is available for use as an assist thread. The processor changes profiling of running of instructions in the program from the main thread to the assist thread.
摘要:
A method and a data processing system by which population count (popcount) operations are efficiently performed without incurring the latency and loss of critical processing cycles and bandwidth of real time processing. The method comprises: identifying data to be stored to memory for which a popcount may need to be determined; speculatively performing a popcount operation on the data as a background process of the processor while the data is being stored to memory; storing the data to a first memory location; and storing a value of the popcount generated by the popcount operation within a second memory location. The method further comprises: determining a size of data; determining a granular level at which the popcount operation on the data will be performed; and reserving a size of said second memory location that is sufficiently large to hold the value of the popcount.