摘要:
A method for repairing a pipeline in response to a branch instruction having a branch, includes the steps of providing a branch repair table having a plurality of entries, allocating an entry in the branch repair table for the branch instruction, storing a target address, a fall-through address, and repair information in the entry in the branch repair table, processing the branch instruction to determine whether the branch was taken, and repairing the pipeline in response to the repair information and the fall-through address in the entry in the branch repair table when the branch was not taken.
摘要:
A method and apparatus for performing multiple branch predictions per cycle is disclosed. The method and apparatus according to the present invention determine, within one fetch cycle, which instructions in a plurality of fetch instructions are branches and whether such branches are taken or not taken thereby finding the oldest taken branch, which has a target address that is fetched within the same fetch cycle.
摘要:
One embodiment of the present invention provides a method and an apparatus for predicting the target of a branch instruction. This method and apparatus operate by using a translation lookaside buffer (TLB) to store page numbers for predicted branch target addresses. In this embodiment, a branch target address table stores a small index to a location in the translation lookaside buffer, and this index is used retrieve a page number from the location in the translation lookaside buffer. This page number is used as the page number portion of a predicted branch target address. Thus, a small index into a translation lookaside buffer can be stored in a predicted branch target address table instead of a larger page number for the predicted branch target address. This technique effectively reduces the size of a predicted branch target table by eliminating much of the space that is presently wasted storing redundant page numbers. Another embodiment maintains coherence between the branch target address table and the translation lookaside buffer. This makes it possible to detect a miss in the translation lookaside buffer at least one cycle earlier by examining the branch target address table.
摘要:
A cache memory array stores two-way set associative data. An odd set data bank stores odd number sets of the two-way set associative data, where the two ways of each odd number set are aligned horizontally within the odd set data bank. An even set data bank stores even number sets of the two-way set associative data, where the two ways of each even number set are aligned horizontally within the even set data bank. Also, the odd set data bank is aligned horizontally with the even set data bank such that each odd number set is aligned horizontally with a next even number set. The horizontally aligned ways are interleaved for data path width reduction. Set and way selection circuits extract lines of data from the array. The array may be structurally implemented by single-ported RAM cells.
摘要:
Two-way set associative data is stored in a cache memory array. An odd set data bank stores odd number sets of the two-way set associative data, where the two ways of each odd number set are aligned horizontally within the odd set data bank. An even set data bank stores even number sets of the two-way set associative data, where the two ways of each even number set are aligned horizontally within the even set data bank. Also, the odd set data bank is aligned horizontally with the even set data bank such that each odd number set is aligned horizontally with a next even number set. The horizontally aligned ways are interleaved for data path width reduction. Set and way selection circuits extract lines of data from the array. The array may be structurally implemented by single-ported RAM cells.
摘要:
One embodiment of the present invention provides a system for predicting an address of an instruction following a branch instruction in a computer instruction stream. This system receives a current address specifying an address of a current instruction. It uses this current address (or possibly a preceding address) to generate a first select signal, which is used to select a first predicted address of an instruction following the current instruction in the computer instruction stream. At the same time the system generates a second select signal, which takes more time to generate than the first select signal but achieves a more accurate selection for a predicted address of the instruction following the current instruction. The system assumes that the first predicted address is correct and proceeds with a subsequent instruction fetch operation using the first predicted address. Next, the system compares the first select signal with the second select signal. If the first select signal is the same as the second select signal, the system allows the subsequent instruction fetch operation to proceed using the first predicted address. Otherwise, the system uses the second select signal to select a second predicted address, and delays the subsequent instruction fetch operation so that the instruction fetch operation can proceed using the second predicted address. This bi-level architecture allows branch prediction work efficiently even at the higher clock frequencies that arise as semiconductor technologies continue to improve.
摘要:
One embodiment of the present invention provides a system for predicting an address of an instruction following a branch instruction in a computer instruction stream. This system concurrently performs a fast single-cycle branch prediction operation to produce a first predicted address, and a more-accurate multiple-cycle branch prediction operation to produce a second predicted address. The system assumes that the first predicted address is correct and proceeds with a subsequent instruction fetch operation using the first predicted address. If the first predicted address is the same as the second predicted address, the subsequent instruction fetch operation is allowed to proceed using the first predicted address. Otherwise, the subsequent fetch operation is delayed so that it can proceed using the second predicted address. In this way, the system will typically perform a fast instruction fetch operation using the first predicted address, and will less frequently have to wait for the more-accurate second predicted address. This bi-level architecture allows branch prediction work efficiently even at the higher clock frequencies that arise as semiconductor technologies continue to improve. In accordance with one feature of the above embodiment, the multiple-cycle branch prediction operation involves selecting the second predicted address from between a branch target address, a next sequential address and a return address from a function call. In accordance with another feature, the second predicted address is selected using information from a branch type table, which contains information specifying the type of branch instructions located at particular addresses.
摘要:
An instruction fetch unit for fetching instructions from an instruction cache of a processor. The fetch unit includes a next fetch address mechanism generating predicted next fetch addresses, the next fetch address mechanism generating a next fetch address for a fetch bundle over at least two cycles of the processor. The next fetch address mechanism determines the next fetch address based on whether a control transfer instruction from an intermediate set of fetched instructions is taken.
摘要:
In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.
摘要:
A method of handling an exception in a processor includes setting a state upon detection of an exception, signaling a trap for the exception if the state is set, and based on a class of the exception, processing the exception differently before signaling the trap. The method may include replaying an instruction causing the exception before signaling the trap for the exception based on the class of the exception. The method may include replaying the instruction causing the exception after the instruction causing the exception becomes an oldest, unretired instruction. The method may include signaling the trap for the exception after an instruction causing the exception becomes an oldest, unretired instruction. The method may include marking an instruction causing the exception as complete without issuing the instruction causing the exception. An apparatus for handling exceptions in a processor includes an instruction scheduler for setting a state upon detection of an exception and signaling a trap for the exception if the state is set. The instruction scheduler, based on a class of the exception, processes the exception differently before signaling the trap.