Abstract:
A method for supporting a plurality of load accesses is disclosed. A plurality of requests to access a data cache is accessed, and in response, a tag memory is accessed that maintains a plurality of copies of tags for each entry in the data cache. Tags are identified that correspond to individual requests. The data cache is accessed based on the tags that correspond to the individual requests. A plurality of requests to access the same block of the plurality of blocks causes an access arbitration that is executed in the same clock cycle as is the access of the tag memory.
Abstract:
Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L1TLB) miss corresponding to a request for a virtual address to physical address translation, searching a cache that includes virtual addresses and page sizes that correspond to translation table entries (TTEs) that have been evicted from the L1TLB, where a page size is identified, and searching a second level TLB and identifying a physical address that is contained in the second level TLB. Access is provided to the identified physical address.
Abstract:
A method for translating instructions for a processor. The method includes accessing a guest instruction and performing a first level translation of the guest instruction using a first level conversion table. The method further includes outputting a resulting native instruction when the first level translation proceeds to completion. A second level translation of the guest instruction is performed using a second level conversion table when the first level translation does not proceed to completion, wherein the second level translation further processes the guest instruction based upon a partial translation from the first level conversion table. The resulting native instruction is output when the second level translation proceeds to completion.
Abstract:
A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.
Abstract:
A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution.
Abstract:
A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.
Abstract:
An apparatus and method for performing vertical parallel operations on packed data is described. A first set of data operands and a second set of data operands are accessed. Each of these sets of data represents graphical data stored in a first format. The first set of data operands is convereted into a converted set and the second set of data operands is replicated to generate a replicated set. A vertical matrix multiplication is performed on the converted set and the replicated set to generate transformed graphical data.
Abstract:
A method and apparatus for updating the architectural state in a system implementing staggered execution with multiple micro-instructions. According to one aspect of the invention, a method is provided in which a macro-instruction is decoded into a first and second micro-instructions. The macro-instruction designates an operation on a pieced of data, and execution of the first and second micro-instructions separately cause the operation to be performed on different parts of the piece of data. The method also requires that the first micro-instruction is executed irrespective of the second micro-instructions (e.g., at a different time), and that it is detected that said second micro-instruction will not cause any non-recoverable exceptions. The results of the first micro-instruction are then used to update the architectural state in an earlier clock cycle than said second micro-instruction.
Abstract:
A circuit for generating partial products for variable width multiplication operations is provided. According to an embodiment of the present invention, the circuit includes a plurality of partial product selector groups, each partial product selector group includes a plurality of partial product selector circuits. Each partial product selector circuit receives a portion of a multiplicand as an input and outputs a partial product. The circuit also includes a plurality of Booth encoders. At least one of the Booth encoders is coupled to each partial product selector group. Each Booth encoder receives as an input a portion of a wide multiplier and outputs a Booth encoded value to at least a portion of a partial product selector group. The circuit further includes an override circuit coupled to one or more of the partial product selector circuits. The override circuit is operable to control one or more of the partial product selector circuits to output a zero to thereby avoid unwanted cross-products when performing multiple smaller multiplications using the same circuit.