摘要:
Emulation of instructions that include non-contiguous specifiers is facilitated. A non-contiguous specifier specifies a resource of an instruction, such as a register, using multiple fields of the instruction. For example, multiple fields of the instruction (e.g., two fields) include bits that together designate a particular register to be used by the instruction. Non-contiguous specifiers of instructions defined in one computer system architecture are transformed to contiguous specifiers usable by instructions defined in another computer system architecture. The instructions defined in the another computer system architecture emulate the instructions defined for the one computer system architecture.
摘要:
Embodiments of the invention relate to hybrid address translation. An aspect of the invention includes receiving a first address, the first address referencing a location in a first address space. The computer searches a segment lookaside buffer (SLB) for a SLB entry corresponding to the first address; the SLB entry comprising a type field and an address field and determines whether a value of the type field in the SLB entry indicates a hashed page table (HPT) search or a radix tree search. Based on determining that the value of the type field indicates the HPT search, a HPT is searched to determine a second address, the second address comprising a translation of the first address into a second address space; and based on determining that the value of the type field indicates the radix tree search, a radix tree is searched to determine the second address.
摘要:
Systems, methods and computer products for cross-thread scheduling. Exemplary embodiments include a cross thread scheduling method for compiling code, the method including scheduling a scheduling unit with a scheduler sub-operation in response to the scheduling unit being in a non-multithreaded part of the code and scheduling the scheduling unit with a cross-thread scheduler sub-operation in response to the scheduling unit being in a multithreaded part of the code.
摘要:
A technique is provided for generating stubs. A processing circuit receives a call to a called function. The processing circuit retrieves a called function property of the called function. The processing circuit generates a stub for the called function based on the called function property.
摘要:
Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.
摘要:
Embodiments relate to reducing a number of read ports for register pairs. An aspect includes maintaining an active pairing indicator that is configured to have a first value or a second value. The first value indicates that the wide operand is stored in a wide register. The second value indicates that the wide operand is not stored in the wide register. The operand is read from either the wide register or a pair of registers based on the active pairing indicator. The active pairing indicator and the values of the set of wide registers are stored to a storage based on a request to store a register pairing status. A saved pairing indicator and saved values of the set of wide registers is loaded from the storage respectively into an active pairing register and wide registers.
摘要:
A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.
摘要:
Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.
摘要:
A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary is dynamically determined based on a specified type of boundary and one or more characteristics of the processor executing the instruction, such as cache line size or page size used by the processor.
摘要:
A technique is provided for performing a mixed precision estimate. A processing circuit receives an input of a first precision having a wide precision value. The processing circuit computes an output in an output exponent range corresponding to a narrow precision value based on the input having the wide precision value.