Abstract:
A data processing system includes a plurality of processors, local memories associated with a corresponding processor, and at least one inter-processor link. In response to a first processor performing a load or store operation on an address of a corresponding local memory that is not currently in the local cache, a local cache allocates a first cache line and encodes a local state with the first cache line. In response to a load operation from an address of a remote memory that is not currently in the local cache, the local cache allocates a second cache line and encodes a remote state with the second cache line. The first processor performs subsequent loads and stores on the first cache line in the local cache in response to the local state, and subsequent loads from the second cache line in the local cache in response to the remote state.
Abstract:
A method and apparatus for reducing TLB shootdown operation overheads in accelerator-based computing systems is described. The disclosed method and apparatus may also be used in the areas of near-memory and in-memory computing, where near-memory or in-memory compute units may need to share a host CPU's virtual address space. Metadata is associated with page table entries (PTEs) and mechanisms use the metadata to limit the number of processing elements that participate in a TLB shootdown operation.
Abstract:
Methods and apparatus are described. A method includes an accelerated processing device running a process. When a maximum time interval during which the process is permitted to run expires before the process completes, the accelerated processing device receives an operating-system-initiated instruction to stop running the process. The accelerated processing device stops the process from running in response to the received operating-system-initiated instruction.
Abstract:
A method and apparatus are described for performing sprinting in a processor. An analyzer in the processor may monitor thermal capacity remaining in the processor while not sprinting. When the remaining thermal capacity is sufficient to support sprinting, the analyzer may perform sprinting of a new workload when a benefit derived by sprinting the new workload exceeds a threshold and does not cause the remaining thermal capacity in the processor to be exhausted. The analyzer may perform sprinting of the new workload in accordance with sprinting parameters determined for the new workload. The analyzer may continue to monitor the remaining thermal capacity while not sprinting when the benefit derived by sprinting the new workload does not exceed the threshold.
Abstract:
A method and apparatus are described for performing sprinting in a processor. An analyzer in the processor may monitor thermal capacity remaining in the processor while not sprinting. When the remaining thermal capacity is sufficient to support sprinting, the analyzer may perform sprinting of a new workload when a benefit derived by sprinting the new workload exceeds a threshold and does not cause the remaining thermal capacity in the processor to be exhausted. The analyzer may perform sprinting of the new workload in accordance with sprinting parameters determined for the new workload. The analyzer may continue to monitor the remaining thermal capacity while not sprinting when the benefit derived by sprinting the new workload does not exceed the threshold.
Abstract:
Apparatus, computer readable medium, and method of servicing memory requests are presented. A first plurality of memory requests are associated together, wherein each of the first plurality of memory requests is generated by a corresponding one of a first plurality of processors, and wherein each of the first plurality of processors is executing a first same instruction. A second plurality of memory requests are associated together, wherein each of the second plurality of memory requests is generated by a corresponding one of a second plurality of processors, and wherein each of the second plurality of processors is executing a second same instruction. A determination is made to service the first plurality of memory requests before the second plurality of memory requests and the first plurality of memory requests is serviced before the second plurality of memory requests.
Abstract:
The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode.