摘要:
The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.
摘要:
The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.
摘要:
An SMT system has a single thread mode and an SMT mode. Instructions are alternately selected from two threads every clock cycle and loaded into the IFAR in a three cycle pipeline of the IFU. If a branch predicted taken instruction is detected in the branch prediction circuit in stage three of the pipeline, then in the single thread mode a calculated address from the branch prediction circuit is loaded into the IFAR on the next clock cycle. If the instruction in the branch prediction circuit detects a branch predicted taken in the SMT mode, then the selected instruction address is loaded into the IFAR on the first clock cycle following branch predicted taken detection. The calculated target address is fed back and loaded into the IFAR in the second clock cycle following branch predicted taken detection. Feedback delay effectively switches the pipeline from three stages to four stages.
摘要:
In an instruction fetch unit for an information handling system which decodes instructions, calculates target addresses of multiple branch instructions, and resolves multiple branch instructions in parallel instead of sequentially, the critical path through a multiple way set associative instruction cache is through a directory and compare circuit which selects which way instructions will be retrieved. This patch is known as the late select path. A multi-ported effective address (EA) directory is provided and is accessed prior to selection of a fetch address which fetches the next set of instructions from the cache. In this manner, the time required for the late select path can be reduced.
摘要:
An improved method of addressing within a pipelined processor having an address bit width of m+n bits is disclosed, which includes storing m high order bits corresponding to a first range of addresses, which encompasses a selected plurality of data executing within the pipelined processor. The n low order bits of addresses associated with each of the selected plurality of data are also stored. After determining the address of a subsequent datum to be executed within the processor, the subsequent datum is fetched. In response to fetching a subsequent datum having an address outside of the first range of addresses, a status register is set to a first of two states to indicate that an update to the first address register is required. In response to the status register being set to the second of the two states, the subsequent datum is dispatched for execution within the pipelined processor. The n low order bits of the subsequent datum are then stored, such that memory required to store addresses of instructions executing within the pipelined processor is thereby decreased.
摘要:
Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.
摘要:
Systems and methods for handling the event of a wrong branch prediction and an instruction rejection in a digital processor are disclosed. More particularly, hardware and software are disclosed for detecting a condition where a branch instruction was mispredicted and an instruction that preceded the branch instruction is rejected after the branch instruction is executed. When the condition is detected, the branch instruction and rejected instruction are recirculated for execution. Until, the branch instruction is re-executed, control circuitry can prevent instructions from being received into an instruction buffer that feeds instructions to the execution units of the processor by fencing the instruction buffer from the fetcher. The instruction fetcher may continue fetching instructions along the branch target path into a local cache until the fence is dropped.
摘要:
Branch prediction logic is enhanced to provide a monitoring function for certain conditions which indicate that the use of separate BHTs and predicted target address cache would provide better results for branch prediction. The branch prediction logic responds to the occurrence of the monitored condition by logically splitting the BHTs and count cache so that half of the address space is allocated to a first thread and the second half is allocated to the next thread. Prediction-generated addresses that belong to the first thread are then directed to the half of the array that is allocated to that thread and prediction-generated addresses that belong to the second thread are directed to the next half of the array that is allocated to the second thread. In order to split the array, the highest order bit in the array is utilized to uniquely identify addresses of the first and the second threads.
摘要:
In a branch instruction target address cache, an entry associated with a fetched block of instructions includes a target address of a branch instruction residing in the next sequential block of instructions. The entry will include a sequential address associated with the branch instruction and a prediction of whether the target address is taken or not taken.
摘要:
A Content Addressable Memory (CAM) includes a plurality of conventional CAM cells arranged in rows and columns, the CAM cells being coupled together by word lines and bit lines for reading from and writing information into the CAM cells. Each CAM cell in a row is connected to a match line which provides an indication when an input word on the bit lines matches the contents of the cells coupled to the match line. A combinatorial logic circuit including a parity generator is coupled to the word lines and provides an indication when multiple words have been selected by an input word. The combinatorial logic circuit includes an OR circuit providing a first input to an AND circuit and a parity generator providing a second input to the AND which only provides an output when multiple word lines have been selected by an input word.