摘要:
Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
摘要:
Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
摘要:
A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.
摘要:
Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.
摘要:
Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.
摘要:
A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.
摘要:
Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.
摘要:
A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.
摘要:
A system, method and computer program product for supporting thread level speculative execution in a computing environment having multiple processing units adapted for concurrent execution of threads in speculative and non-speculative modes. Each processing unit includes a cache memory hierarchy of caches operatively connected therewith. The apparatus includes an additional cache level local to each processing unit for use only in a thread level speculation mode, each additional cache for storing speculative results and status associated with its associated processor when handling speculative threads. The additional local cache level at each processing unit are interconnected so that speculative values and control data may be forwarded between parallel executing threads. A control implementation is provided that enables speculative coherence between speculative threads executing in the computing environment.
摘要:
A system for monitoring a large number of simultaneous events implements a hybrid counter array device having a first counter portion comprising counter devices, each counter device for receiving signals representing occurrences of events from an event source and providing a first count value corresponding to a lower order bits of the hybrid counter array. A second counter portion comprises a memory array device having addressable memory locations in correspondence with the counter devices, each addressable memory location for storing a second count value representing higher order bits. A control device monitors each of the counter devices and initiates updating a value of a corresponding second count value stored at the corresponding addressable memory location. The system includes interrupt pre-indication for providing fast interrupt trigger to a processor device when a count value related to an event equals a threshold value. A data transfer sub-system additionally enables one or more of: read access or write access to both the count values in the first and second counter portions over a narrow bus, the read/write access for purposes of initializing and determining status of the count values for a monitored event type in response to a processor device request.