摘要:
Mechanism for accurately measuring useful capacity of a processor allocated to each thread in a simultaneously multi-threading data processing system. Instructions dispatched from multiple threads are executed by the processor on a same clock cycle. A determination is made whether Time Base (TB) register bit (60) is changing. A dispatch charge value is determined for each thread, and added to the Processor Utilization Resource Register for each thread when TB bit (60) changes.
摘要:
A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.
摘要:
A mechanism is provided for thread completion arbitration. The mechanism comprises executing more than two threads of instructions simultaneously in the processor, selecting a first thread from a first subset of threads, in the more than two threads, for completion of execution within the processor, and selecting a second thread from a second subset of threads, in the more than two threads, for completion of execution within the processor. The mechanism further comprises completing execution of the first and second threads by committing results of the execution of the first and second threads to a storage device associated with the processor. At least one of the first subset of threads or the second subset of threads comprise two or more threads from the more than two threads. The first subset of threads and second subset of threads have different threads from one another.
摘要:
A processor includes at least one execution unit that executes instructions, at least one register file, coupled to the at least one execution unit, that buffers operands for access by the at least one execution unit, and an instruction sequencing unit that fetches instructions for execution by the execution unit. The processor further includes an operand data structure and an address generation accelerator. The operand data structure specifies a first relationship between addresses of sequential accesses within a first address region and a second relationship between addresses of sequential accesses within a second address region. The address generation accelerator computes a first address of a first memory access in the first address region by reference to the first relationship and a second address of a second memory access in the second address region by reference to the second relationship.
摘要:
A system and method for latency-aware thread scheduling in non-uniform cache architecture are provided. Instructions may be provided to the hardware specifying in which banks to store data. Information as to which banks store which data may also be provided, for example, by the hardware. This information may be used to schedule threads on one or more cores. A selected bank in cache memory may be reserved strictly for selected data.
摘要:
A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation.
摘要:
A data processing system has an asynchronous memory mover, which includes multiple sets of registers for storing addressing and control parameters utilized to generate one or more asynchronous memory move (AMM) operations. The memory mover detects a receipt of a first set of parameters in a first set of registers from the processor. The processor forwards the parameters after the processor initiates a data move in virtual address space, utilizing a source effective address and a destination effective address. The memory mover responds to receiving the first set of parameters by generating and launching a first asynchronous memory move (AMM) operation. When the memory mover receives a second set of parameters in a second set of registers before the first AMM operation completes, the memory mover generates and launches a second AMM operation concurrently with the first AMM operation if no address conflicts exist.
摘要:
While an AMM operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers a cache injection by the AMM mover (or memory controller) of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache. The memory controller also forwards the next cache lines in the sequence of data to the L2 cache and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller does not overrun the upper caches, by placing moved data into only a subset of the available cache lines of the upper level cache.
摘要:
In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.
摘要:
An apparatus and program product utilize a multithreaded processor having at least one hardware thread among a plurality of hardware threads that is capable of being selectively activated and deactivated responsive to a control circuit. The control circuit additionally provides the capability of controlling how an inactive thread can be activated after the thread has been deactivated, e.g., by enabling or disabling reactivation in response to an interrupt.