摘要:
A method and system is disclosed for software manipulation of hardware prediction mechanism in a data processor with software prediction. The hardware branch prediction mechanism is enhanced with at least two bits for path prediction. These bits are settable by a software and are capable of overriding the hardware branch prediction mechanism. Branch prediction information is encoded into a branch instruction in the software. This information includes a pre-determined value for each bit. Finally, a branch path of said instruction is predicted based on the value of the bits.
摘要:
Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.
摘要:
In a branch instruction target address cache, an entry associated with a fetched block of instructions includes a target address of a branch instruction residing in the next sequential block of instructions. The entry will include a sequential address associated with the branch instruction and a prediction of whether the target address is taken or not taken.
摘要:
An SMT system has a single thread mode and an SMT mode. Instructions are alternately selected from two threads every clock cycle and loaded into the IFAR in a three cycle pipeline of the IFU. If a branch predicted taken instruction is detected in the branch prediction circuit in stage three of the pipeline, then in the single thread mode a calculated address from the branch prediction circuit is loaded into the IFAR on the next clock cycle. If the instruction in the branch prediction circuit detects a branch predicted taken in the SMT mode, then the selected instruction address is loaded into the IFAR on the first clock cycle following branch predicted taken detection. The calculated target address is fed back and loaded into the IFAR in the second clock cycle following branch predicted taken detection. Feedback delay effectively switches the pipeline from three stages to four stages.
摘要:
Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.
摘要:
Branch prediction logic is enhanced to provide a monitoring function for certain conditions which indicate that the use of separate BHTs and predicted target address cache would provide better results for branch prediction. The branch prediction logic responds to the occurrence of the monitored condition by logically splitting the BHTs and count cache so that half of the address space is allocated to a first thread and the second half is allocated to the next thread. Prediction-generated addresses that belong to the first thread are then directed to the half of the array that is allocated to that thread and prediction-generated addresses that belong to the second thread are directed to the next half of the array that is allocated to the second thread. In order to split the array, the highest order bit in the array is utilized to uniquely identify addresses of the first and the second threads.
摘要:
Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.
摘要:
Method, system and computer program product for determining the targets of branches in a data processing system. A method for determining the target of a branch in a data processing system includes performing at least one pre-calculation relating to determining the target of the branch prior to writing the branch into a Level 1 (L1) cache to provide a pre-decoded branch, and then writing the pre-decoded branch into the L1 cache. By pre-calculating matters relating to the targets of branches before the branches are written into the L1 cache, for example, by re-encoding relative branches as absolute branches, a reduction in branch redirect delay can be achieved, thus providing a substantial improvement in overall processor performance.
摘要:
A transfer tag is generated by the Instruction Fetch Unit and passed to the decode unit in the instruction pipeline with each group of instructions fetched during a branch prediction by a fetcher. Individual instructions within the fetched group for the branch pipeline are assigned a concatenated version (group tag concatenated with instruction lane) of the transfer tag which is used to match on requests to flush any newer instructions. All potential instruction or Internal Operation latches in the decode pipeline must perform a match and if a match is encountered, all valid bits associated with newer instructions or internal operations upstream from the match are cleared. The transfer tag representing the next instruction to be processed in the branch pipeline is passed to the Instruction Dispatch Unit. The Instruction Dispatch Unit queries the branch pipeline to compare its transfer tag with transfer tags of instructions in the branch pipeline. If the transfer tag matches a branch instruction tag the Instruction Decode Unit is stalled until the branch instruction is processed thus, providing a synchronizing method for the parallel pipelines.
摘要:
Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information.