摘要:
An apparatus for emulating the branch prediction behavior of an explicit subroutine call is disclosed. The apparatus includes a first input which is configured to receive an instruction address and a second input. The second input is configured to receive predecode information which describes the instruction address as being related to an implicit subroutine call to a subroutine. In response to the predecode information, the apparatus also includes an adder configured to add a constant to the instruction address defining a return address, causing the return address to be stored to an explicit subroutine resource, thus, facilitating subsequent branch prediction of a return call instruction.
摘要:
In a processor executing instructions from a variable-length instruction set, a preload instruction is operative to retrieve from memory a data block corresponding to an instruction cache line, pre-decode instructions from a variable-length instruction set in the data block, and load the instructions and pre-decode information into the instruction cache. An instruction execution unit indicates to a pre-decoder the position within the data block of a first valid instruction. The pre-decoder successively determines the length of each instruction and hence the instruction boundaries. An instruction cache line offset indicator that identifies the position of the first valid instruction may be generated and provided to the pre-decoder in a variety of ways.
摘要:
Intermediate results are passed between constituent instructions of an expanded instruction using register renaming resources and control logic. A first constituent instruction generates intermediate results and is assigned a PRN in a constituent instruction rename table, and writes intermediate results to the identified physical register. A second constituent instruction performs a look up in the constituent instruction rename table and reads the intermediate results from the physical register. Constituent instruction rename logic tracks the constituent instructions through the pipeline, and delete the constituent instruction rename table entry and returns the PRN to a free list when the second constituent instruction has read the intermediate results.
摘要:
A processor is operative to execute two or more instruction sets, each in a different instruction set operating mode. As each instruction is executed, debug circuit comparison the current instruction set operating mode to a target instruction set operating mode sent by a programmer, and outputs an alert or indication in they match. The alert or indication may additionally be dependent upon the instruction address following within a predetermined target address range. The alert or indication may comprise a breakpoint signal that halts execution and/or it is output as an external signal of the processor. The instruction address at which the processor detects a match in the instruction set operating modes may additionally be output. Additionally or alternatively, the alert or indication may comprise starting or stopping a trace operation, causing an exception, or any other known debugger function.
摘要:
A processor performs a prefetch operation on non-sequential instruction addresses. If a first instruction address misses in an instruction cache and accesses a higher-order memory as part of a fetch operation, and a branch instruction associated with the first instruction address or an address following the first instruction address is detected and predicted taken, a prefetch operation is performed using a predicted branch target address, during the higher-order memory access. If the predicted branch target address hits in the instruction cache during the prefetch operation, associated instructions are not retrieved, to conserve power. If the predicted branch target address misses in the instruction cache during the prefetch operation, a higher-order memory access may be launched, using the predicted branch instruction address. In either case, the first instruction address is re-loaded into the fetch stage pipeline to await the return of instructions from its higher-order memory access.
摘要:
Systems and method for memory management units (MMUs) configured to automatically pre-fill a translation lookaside buffer (TLB) with address translation entries expected to be used in the future, thereby reducing TLB miss rate and improving performance. The TLB may be pre-filled with translation entries, wherein addresses corresponding to the pre-fill may be selected based on predictions. Predictions may be derived from external devices, or based on stride values, wherein the stride values may be a predetermined constant or dynamically altered based on access patterns. Pre-filling the TLB may effectively remove latency involved in determining address translations for TLB misses from the critical path.
摘要:
A first processing unit and a second processing unit can access a system memory that stores a common page table that is common to the first processing unit and the second processing unit. The common page table can store virtual memory addresses to physical memory addresses mapping for memory chunks accessed by a job of an application. A page entry, within the common page table, can include a first set of attribute bits that defines accessibility of the memory chunk by the first processing unit, a second set of attribute bits that defines accessibility of the same memory chunk by the second processing unit, and physical address bits that define a physical address of the memory chunk.
摘要:
Configuring a surrogate memory accessing agent using an instruction for translating and storing a data value is described. In one embodiment, the instruction is received that includes a first operand specifying a data value to be translated and a second operand specifying a virtual address associated with a location of a surrogate memory accessing agent register in which to store the data value. The data value can be translated to a first physical address. The virtual address can be translated to a second physical address. The first physical address is stored in the surrogate memory accessing agent register based on the second physical address.
摘要:
A first processing unit and a second processing unit can access a system memory that stores a common page table that is common to the first processing unit and the second processing unit. The common page table can store virtual memory addresses to physical memory addresses mapping for memory chunks accessed by a job of an application. A page entry, within the common page table, can include a first set of attribute bits that defines accessibility of the memory chunk by the first processing unit, a second set of attribute bits that defines accessibility of the same memory chunk by the second processing unit, and physical address bits that define a physical address of the memory chunk.
摘要:
Systems and method for memory management units (MMUs) configured to automatically pre-fill a translation lookaside buffer (TLB) with address translation entries expected to be used in the future, thereby reducing TLB miss rate and improving performance. The TLB may be pre-filled with translation entries, wherein addresses corresponding to the pre-fill may be selected based on predictions. Predictions may be derived from external devices, or based on stride values, wherein the stride values may be a predetermined constant or dynamically altered based on access patterns. Pre-filling the TLB may effectively remove latency involved in determining address translations for TLB misses from the critical path.