Abstract:
A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.
Abstract:
A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.
Abstract:
Systems and methods for flushing a cache with modified data are disclosed. Responsive to a request to flush data from a cache with modified data to a next level cache that does not include the cache with modified data, the cache with modified data is accessed using an index and a way and an address associated with the index and the way is secured. Using the address, the cache with modified data is accessed a second time and an entry that is associated with the address is retrieved from the cache with modified data. The entry is placed into a location of the next level cache.
Abstract:
A method for supporting a plurality of load accesses is disclosed. A plurality of requests to access a data cache is accessed, and in response, a tag memory is accessed that maintains a plurality of copies of tags for each entry in the data cache. Tags are identified that correspond to individual requests. The data cache is accessed based on the tags that correspond to the individual requests. A plurality of requests to access the same block of the plurality of blocks causes an access arbitration that is executed in the same clock cycle as is the access of the tag memory.
Abstract:
An apparatus and method for performing a shuffle operation on packed data using computer-implemented steps is described. In one embodiment, a first packed data operand having at least two data elements is accessed. A second packed data operand having at least two data elements is accessed. One of the data elements in the first packed data operand is shuffled into a lower destination field of a destination register, and one of the data elements in the second packed data operand is shuffled into an upper destination field of the destination register.
Abstract:
A method for supporting a plurality of load accesses is disclosed. A plurality of requests to access a data cache is accessed, and in response, a tag memory is accessed that maintains a plurality of copies of tags for each entry in the data cache. Tags are identified that correspond to individual requests. The data cache is accessed based on the tags that correspond to the individual requests. A plurality of requests to access the same block of the plurality of blocks causes an access arbitration that is executed in the same clock cycle as is the access of the tag memory.
Abstract:
Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L1TLB) miss corresponding to a request for a virtual address to physical address translation, searching a cache that includes virtual addresses and page sizes that correspond to translation table entries (TTEs) that have been evicted from the L1TLB, where a page size is identified, and searching a second level TLB and identifying a physical address that is contained in the second level TLB. Access is provided to the identified physical address.
Abstract:
A method for translating instructions for a processor. The method includes accessing a guest instruction and performing a first level translation of the guest instruction using a first level conversion table. The method further includes outputting a resulting native instruction when the first level translation proceeds to completion. A second level translation of the guest instruction is performed using a second level conversion table when the first level translation does not proceed to completion, wherein the second level translation further processes the guest instruction based upon a partial translation from the first level conversion table. The resulting native instruction is output when the second level translation proceeds to completion.
Abstract:
A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.
Abstract:
A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution.