Abstract:
Establishing a branch target instruction cache (BTIC) entry for subroutine returns to reduce pipeline bubbles, and related systems, methods, and computer-readable media are disclosed. In one embodiment, a method of establishing a BTIC entry includes detecting a subroutine call in an execution pipeline. In response, at least one instruction fetched sequential to the subroutine call is written as a branch target instruction in a BTIC entry for a subroutine return. A next instruction fetch address is calculated, and is written into a next instruction fetch address field in the BTIC entry. In this manner, the BTIC may provide correct branch target instruction and next instruction fetch address data for the subroutine return, even if the subroutine return is encountered for the first time or the subroutine is called from different calling locations.
Abstract:
Techniques are described for a multi-processor having two or more processors that increases the opportunity for a load-exclusive command to take a cache line in an Exclusive state, which results in increased performance when a store-exclusive is executed. A new bus operation read prefer exclusive is used as a hint to other caches that a requesting master is likely to store to the cache line, and, if possible, the other cache should give the line up. In most cases, this will result in the other master giving the line up and the requesting master taking the line Exclusive. In most cases, two or more processors are not performing a semaphore management sequence to the same address at the same time. Thus, a requesting master's load-exclusive is able to take a cache line in the Exclusive state an increased number of times.
Abstract:
Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false.
Abstract:
Predicting memory instruction punts in a computer processor using a punt avoidance table (PAT) are disclosed. In one aspect, an instruction processing circuit accesses a PAT containing entries each comprising an address of a memory instruction. Upon detecting a memory instruction in an instruction stream, the instruction processing circuit determines whether the PAT contains an entry having an address of the memory instruction. If so, the instruction processing circuit prevents the detected memory instruction from taking effect before at least one pending memory instruction older than the detected memory instruction, to preempt a memory instruction punt. In some aspects, the instruction processing circuit may determine, upon execution of a pending memory instruction, whether a hazard associated with the detected memory instruction has occurred. If so, an entry for the detected memory instruction is generated in the PAT.
Abstract:
Fusing immediate value, write-based instructions in instruction processing circuits, and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction indicating an operation writing an immediate value to a register is detected by an instruction processing circuit. The circuit also detects at least one subsequent instruction indicating an operation that overwrites at least one first portion of the register while maintaining a value of a second portion of the register. The at least one subsequent instruction is converted (or replaced) with a fused instruction(s), which indicates an operation writing the at least one first portion and the second portion of the register. In this manner, conversion of multiple instructions for generating a constant into the fused instruction(s) removes the potential for a read-after-write hazard and associated consequences caused by dependencies between certain instructions, while reducing a number of clock cycles required to process the instructions.
Abstract:
Techniques are described for a multi-processor having two or more processors that increases the opportunity for a load-exclusive command to take a cache line in an Exclusive state, which results in increased performance when a store-exclusive is executed. A new bus operation read prefer exclusive is used as a hint to other caches that a requesting master is likely to store to the cache line, and, if possible, the other cache should give the line up. In most cases, this will result in the other master giving the line up and the requesting master taking the line Exclusive. In most cases, two or more processors are not performing a semaphore management sequence to the same address at the same time. Thus, a requesting master's load-exclusive is able to take a cache line in the Exclusive state an increased number of times.
Abstract:
A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
Abstract:
A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand.
Abstract:
An instruction in an instruction cache line having a first portion that is cacheable, a second portion that is from a page that is non-cacheable, and crosses a cache line is prevented from executing from the instruction cache. An attribute associated with the non-cacheable second portion is tracked separately from the attributes of the rest of the instructions in the cache line. If the page crossing instruction is reached for execution, the page crossing instruction and instructions following are flushed and a non-cacheable request is made to memory for at least the second portion. Once the second portion is received, the whole page crossing instruction is reconstructed from the first portion saved in the previous fetch group. The page crossing instruction or portion thereof is returned with the proper attribute for a non-cached fetched instruction and the reconstructed instruction can be executed without being cached.
Abstract:
Fusing flag-producing and flag-consuming instructions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a flag-producing instruction indicating a first operation generating a first flag result is detected in an instruction stream by an instruction processing circuit. The instruction processing circuit also detects a flag-consuming instruction in the instruction stream indicating a second operation consuming the first flag result as an input. The instruction processing circuit generates a fused instruction indicating the first operation generating the first flag result and indicating the second operation consuming the first flag result as the input. In this manner, as a non-limiting example, the fused instruction eliminates a potential for a read-after-write hazard between the flag-producing instruction and the flag-consuming instruction.