摘要:
A processing device implementing stride-based translation lookaside buffer (TLB) prefetching with adaptive offset is disclosed. A processing device of the disclosure includes a data prefetcher to generate a data prefetch address based on a linear address, a stride, or a prefetch distance, the data prefetch address associated with a data prefetch request, and a TLB prefetch address computation component to generate a TLB prefetch address based on the linear address, the stride, the prefetch distance, or an adaptive offset. The processing device also includes a cross page detection component to determine that the data prefetch address or the TLB prefetch address cross a page boundary associated with the linear address, and cause a TLB prefetch request to be written to a TLB request queue, the TLB prefetch request for translation of an address of a linear page number (LPN) based on the data prefetch address or the TLB prefetch address.
摘要:
A processing device implementing stride-based translation lookaside buffer (TLB) prefetching with adaptive offset is disclosed. A processing device of the disclosure includes a data prefetcher to generate a data prefetch address based on a linear address, a stride, or a prefetch distance, the data prefetch address associated with a data prefetch request, and a TLB prefetch address computation component to generate a TLB prefetch address based on the linear address, the stride, the prefetch distance, or an adaptive offset. The processing device also includes a cross page detection component to determine that the data prefetch address or the TLB prefetch address cross a page boundary associated with the linear address, and cause a TLB prefetch request to be written to a TLB request queue, the TLB prefetch request for translation of an address of a linear page number (LPN) based on the data prefetch address or the TLB prefetch address.
摘要:
Techniques and mechanisms for a processor to determine whether a commit action is to be performed. In an embodiment, a processor performs operations to determine whether a commit instruction is for contingent performance of a commit action. In another embodiment, one or more conditions of processor state are evaluated in response to determining that the commit instruction is for contingent performance of the commit action, where the evaluation is performed to determine whether the commit action indicated by the commit instruction is to be performed.
摘要:
Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
摘要:
A computer-readable storage medium, method and system for optimization-level aware branch prediction is described. A gear level is assigned to a set of application instructions that have been optimized. The gear level is also stored in a register of a branch prediction unit of a processor. Branch prediction is then performed by the processor based upon the gear level.
摘要:
A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
摘要:
Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.
摘要:
Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: indentifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.
摘要:
A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.
摘要:
Techniques are described for providing an enhanced cache coherency protocol for a multi-core processor that includes a Speculative Request For Ownership Without Data (SRFOWD) for a portion of cache memory. With a SRFOWD, only an acknowledgement message may be provided as an answer to a requesting core. The contents of the affected cache line are not required to be a part of the answer. The enhanced cache coherency protocol may assure that a valid copy of the current cache line exists in case of misspeculation by the requesting core. Thus, an owner of the current copy of the cache line may maintain a copy of the old contents of the cache line. The old contents of the cache line may be discarded if speculation by the requesting core turns out to be correct. Otherwise, in case of misspeculation by the requesting core, the old contents of the cache line may be set back to a valid state.