摘要:
A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides.
摘要:
A method for selectively accelerating early instruction processing including receiving an instruction data that is normally processed in an execution stage of a processor pipeline, wherein a configuration of the instruction data allows a processing of the instruction data to be accelerated from the execution stage to an address generation stage that occurs earlier in the processor pipeline than the execution stage, determining whether the instruction data can be dispatched to the address generation stage to be processed without being delayed due to an unavailability of a processing resource needed for the processing of the instruction data in the address generation stage, dispatching the instruction data to be processed in the address generation stage if it can be dispatched without being delayed due to the unavailability of the processing resource, and dispatching the instruction data to be processed in the execution stage if it can not be dispatched without being delayed due to the unavailability of the processing resource, wherein the processing of the instruction data is selectively accelerated using an address generation interlock scheme. A corresponding system and computer program product.
摘要:
Portions of data in a processor system are stored in a slower main memory and are transferred to a faster memory comprising a hierarchy of cache structures between one or more processors and the main memory. For a system with shared L2 cache(s) between the processor(s) and the main memory, an individual L1 cache of a processor must first communicate to an associated L2 cache(s), or check with such L2 cache(s), to obtain a copy of a particular line from a given cache location prior to, or upon modification, or appropriation of data at a given cached location. The individual L1 cache further includes provisions for notifying the L2 cache(s) upon determining when the data stored in the particular cache line in the L1 cache has been replaced, and when the particular cache line is disowned by an L1 cache, the L2 cache is updated to change the state of the particular cache line therein from an ownership state of exclusive to a particular identified CPU to an ownership state of exclusive to no CPU, thereby allowing reduction of cross interrogate delays during another processor acquisition of the same cache line.
摘要:
An embodiment of the invention is a processor for providing simultaneous access to the same data for a plurality of requests. The processor includes cache storage having an address sliced directory lookup structure. A same line detection unit receives a plurality of first instruction fields and a plurality of second instruction fields. The same line detection unit generates a same line signal in response to the first instruction fields and the second instruction fields. The cache storage simultaneously reads data from a single line in the cache storage in response to the same line signal.
摘要:
Embodiments relate to prefetch address translation in a computer processor. An aspect includes issuing, by prefetch logic, a prefetch request comprising a virtual page address. Another aspect includes, based on the prefetch request missing the TLB and the address translation logic of the processor being busy performing a current translation request, comparing a page of the prefetch request to a page of the current translation request. Yet another aspect includes, based on the page of the prefetch request matching the page of the current translation request, storing the prefetch request in a prefetch buffer.
摘要:
A computer system includes a simultaneous multi-threading processor and memory in operable communication with the processor. The processor is configured to perform a method including running multiple threads simultaneously, detecting a hardware error in one or more hardware structures of the processing circuit, and identifying one or more victim threads of the multiple threads. The processor is further configured to identify a plurality of hardware structures associated with execution of the one or more victim threads, isolate the one or more victim threads from the rest of the multiple threads by preventing access to the plurality of hardware structures by the multiple threads, flush the one or more victim threads by resetting hardware states of the plurality of hardware structures, and restore the one or more victim threads by restoring the plurality of hardware structures to a known safe state.
摘要:
Embodiments relate to automatic pattern-based operand prefetching. An aspect includes receiving, by prefetch logic in a processor, an operand cache miss from a pipeline of the processor. Another aspect includes determining that an entry in a history table corresponding to the operand cache miss exists based on an instruction address of the operand cache miss. Yet another aspect includes, based on determining that the entry corresponding to the operand cache miss exists in the history table, issuing a prefetch instruction for a second operand based on the determined entry in the history table, and writing the determined entry into a miss buffer.
摘要:
Embodiments of the invention relate to process identifier (PID) based cache information transfer. An aspect of the invention includes sending, by a first core of a processor, a PID associated with a cache miss in a first local cache of the first core to a second cache of the processor. Another aspect of the invention includes determining that the PID associated with the cache miss is listed in a PID table of the second cache. Yet another aspect of the invention includes based on the PID being listed in the PID table of the second cache, determining a plurality of entries in a cache directory of the second cache that are associated with the PID. Yet another aspect of the invention includes pushing cache information associated with each of the determined plurality of entries in the cache directory from the second cache to the first local cache.
摘要:
Embodiments of the invention relate to performing run-time instrumentation. Run-time instrumentation is captured, by a processor, based on an instruction stream of instructions of an application program executing on the processor. The capturing includes storing the run-time instrumentation data in a collection buffer of the processor. A run-time instrumentation sample point trigger is detected by the processor. Contents of the collection buffer are copied into a program buffer as a reporting group based on detecting the run-time instrumentation sample point trigger. The program buffer is located in main storage in an address space that is accessible by the application program.
摘要:
Embodiments of the invention relate to executing a run-time-instrumentation EMIT (RIEMIT) instruction. A processor is configured to capture the run-time-instrumentation information of a stream of instructions. The RIEMIT instruction is fetched and executed. It is determined if the current run-time-instrumentation controls are configured to permit capturing and storing of run-time-instrumentation information in a run-time-instrumentation program buffer. If the controls are configured to store run-time-instrumentation instructions, then a RIEMIT instruction specified value is stored as an emit record of a reporting group in the run-time-instrumentation program buffer.