Abstract:
A method and system including transmitting data in an architectural format between execution units in a multi-type instruction set architecture and converting data received in the architectural format to an internal format and data output in the internal format to the architectural format based on an operation code and a data type of a microinstruction.
Abstract:
A method and apparatus for including in a processor instructions for performing select operations on packed or unpacked data. In one embodiment, a processor is coupled to a memory. The memory has stored therein first packed data in a source operand and a second packed data in a destination operand. The processor selects the first packed data if the control bit for the source operand is set to “1” and stores the data into the destination operand. Otherwise, the processor keeps the data in the destination operand. The final value of the destination operand is stored in memory.
Abstract:
A method and apparatus for executing partial-width packed data instructions are discussed. The processor may include a plurality of registers, a register renaming unit, a decoder, and a partial-width execution unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. The first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers. In contrast, the second set of instructions specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either of the first or the second set of instructions.
Abstract:
A method of identifying instructions including accessing a plurality of instructions that comprise multiple branch instructions. For each branch instruction of the multiple branch instructions, a respective first mask is generated representing instructions that are executed if a branch is taken. A respective second mask is generated representing instructions that are executed if the branch is not taken. A prediction output is received that comprises a respective branch prediction for each branch instruction. For each branch instruction, the prediction output is used to select a respective resultant mask from among the respective first and second masks. For each branch instruction, a resultant mask of a subsequent branch is invalidated if a previous branch is predicted to branch over said subsequent branch. A logical operation is performed on all resultant masks to produce a final mask. The final mask is used to select a subset of instructions for execution.
Abstract:
Cache replacement policy. In accordance with a first embodiment of the present invention, an apparatus comprises a queue memory structure configured to queue cache requests that miss a second cache after missing a first cache. The apparatus comprises additional memory associated with the queue memory structure is configured to record an evict way of the cache requests for the cache. The apparatus may be further configured to lock the evict way recorded in the additional memory, for example, to prevent reuse of the evict way. The apparatus may be further configured to unlock the evict way responsive to a fill from the second cache to the cache. The additional memory may be a component of a higher level cache.
Abstract:
A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution.
Abstract:
Systems, methods, processors, media, and other embodiments associated with integer rounding a floating point number in one micro-operation (uop) are described. One system embodiment includes a memory to store an integer rounding floating point instruction and a processor to perform the integer rounding floating point instruction. The processor may include a floating point unit that includes circuits and/or logics that integer round the floating point number.
Abstract:
Systems and methods for non-blocking implementation of cache flush instructions are disclosed. As a part of a method, data is accessed that is received in a write-back data holding buffer from a cache flushing operation, the data is flagged with a processor identifier and a serialization flag, and responsive to the flagging, the cache is notified that the cache flush is completed. Subsequent to the notifying, access is provided to data then present in the write-back data holding buffer to determine if data then present in the write-back data holding buffer is flagged.
Abstract:
A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.
Abstract:
An apparatus and method for performing an insert-extract operation on packed data using computer-implemented steps is described. In one embodiment, a first data operand having a data element is accessed. A second packed data operand having at least two data elements is then accessed. The data element in the first data operand is inserted into any destination field of a destination register, or alternatively, a data element is extracted from any field of the source register.