Abstract:
An apparatus is disclosed, the apparatus including a branch target cache memory configured to store one or more entries. Each entry of the one or more entries may include an address tag and a corresponding target address. The apparatus may also include a control circuit configured to check for at least one taken branch instruction in a group of one or more instructions fetched using a current address. The control circuit may be further configured to generate an address tag corresponding to the group of one or more instructions using another address used prior to the current address in response to a determination that the group of one or more instructions includes a taken branch instruction. In addition, the control circuit may be configured to store the corresponding address tag and a target address associated with the taken branch instruction in a particular entry in the branch target cache memory.
Abstract:
Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate.
Abstract:
A system is disclosed, including a plurality of access units, a plurality of circuit nodes each coupled to a respective access unit, and a plurality of data processing nodes each coupled to a respective access unit. A particular data processing node may be configured to generate a plurality of data transactions. The particular data processing node may also be configured to determine an availability of a coupled access unit. In response to a determination that the coupled access unit is unavailable, the particular data processing node may be configured to halt a transfer of the plurality of data transactions to the coupled access unit and assert a halt indicator signal. In response to a determination that the coupled access unit is available, the particular data processing node may be configured to transfer the particular data transaction to the coupled access unit.
Abstract:
A system for generating predictions for a hardware table walk to find a map of a given virtual address to a corresponding physical address is disclosed. The system includes a plurality of memories, which each includes respective plurality of entries, each of which includes a prediction of a particular one of a plurality of buffers which includes a portion of a virtual to physical address translation map. A first circuit may generate a plurality of hash values to retrieve a plurality of predictions from the plurality of memories, where each has value depends on a respective address and information associated with a respective thread. A second circuit may select a particular prediction of the retrieved predictions to use based on a history of previous predictions.
Abstract:
A method and apparatus for performing branch prediction is disclosed. A branch predictor includes a history buffer configured to store a branch history table indicative of a history of a plurality of previously fetched branch instructions. The branch predictor also includes a branch target cache (BTC) configured to store branch target addresses for fetch addresses that have been identified as including branch instructions but have not yet been predicted. A hash circuit is configured to form a hash of a fetch address, history information received from the history buffer, and hit information received from the BTC, wherein the fetch address includes a branch instruction. A branch prediction unit (BPU) configured to generate a branch prediction for the branch instruction included in the fetch address based on the hash formed from the fetch address, history information, and BTC hit information.
Abstract:
A system that for storing program counter values is disclosed. The system may include a program counter, a first memory including a plurality of sectors, a first circuit configured to retrieve a program instruction from a location in memory dependent upon a value of the program counter, send the value of the program counter to an array for storage and determination a predicted outcome of the program instruction in response to a determination that execution of the program instruction changes a program flow. The second circuit may be configured to retrieve the value of the program counter from a given entry in a particular sector of the array, and determine an actual outcome of the program instruction dependent upon the retrieved value of the program counter.
Abstract:
A system is disclosed, including a plurality of access units, a plurality of circuit nodes each coupled to a respective access unit, and a plurality of data processing nodes each coupled to a respective access unit. A particular data processing node may be configured to generate a plurality of data transactions. The particular data processing node may also be configured to determine an availability of a coupled access unit. In response to a determination that the coupled access unit is unavailable, the particular data processing node may be configured to halt a transfer of the plurality of data transactions to the coupled access unit and assert a halt indicator signal. In response to a determination that the coupled access unit is available, the particular data processing node may be configured to transfer the particular data transaction to the coupled access unit.
Abstract:
An apparatus is disclosed, the apparatus including a branch target cache configured to store one or more branch addresses, a memory configured to store a return target stack, and a circuit. The circuit may be configured to determine, for a group of one or more fetched instructions, a prediction value indicative of whether the group includes a return instruction. In response to the prediction value indicating that the group includes a return instruction, the circuit may be further configured to select a return address from the return target stack. The circuit may also be configured to determine a hit or miss indication in the branch target cache for the group, and to, in response to receiving a miss indication from the branch target cache, select the return address as a target address for the return instruction.
Abstract:
An apparatus and method for saving power during TLB searches is disclosed. In one embodiment, a TLB includes a CAM having a plurality of entries each storing a virtual address, and enable logic coupled to the CAM. Responsive to initiation of a TLB query by a thread executing on a processor that includes the TLB, the enable logic is configured to enable only those CAM entries that are associated with the initiating thread. Entries in the CAM not associated with the thread are not enabled. Accordingly, an initial search of the TLB for responsive to the query is conducted only in the CAM entries that are associated with the thread. Those CAM entries that are not associated with the thread are not searched. As a result, dynamic power consumption during TLB searches may be reduced.
Abstract:
An apparatus includes a branch target cache configured to store one or more branch addresses, a memory configured to store a return target stack, and a circuit. The circuit may be configured to determine, for a group of one or more fetched instructions, a prediction value indicative of whether the group includes a return instruction. In response to the prediction value indicating that the group includes a return instruction, the circuit may be further configured to select a return address from the return target stack. The circuit may also be configured to determine a hit or miss indication in the branch target cache for the group, and to, in response to receiving a miss indication from the branch target cache, select the return address as a target address for the return instruction.