Abstract:
In one embodiment, a cache memory includes: a plurality of data banks, each of the plurality of data banks having a plurality of entries each to store a portion of a cache line distributed across the plurality of data banks; and a plurality of tag banks decoupled from the plurality of data banks, wherein a tag for a cache line is to be assigned to one of the plurality of tag banks. Other embodiments are described and claimed.
Abstract:
In one embodiment, a cache memory includes: a plurality of data banks, each of the plurality of data banks having a plurality of entries each to store a portion of a cache line distributed across the plurality of data banks; and a plurality of tag banks decoupled from the plurality of data banks, wherein a tag for a cache line is to be assigned to one of the plurality of tag banks. Other embodiments are described and claimed.
Abstract:
In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.
Abstract:
A processor and method are described for scheduling operations for execution within a reservation station. For example, a method in accordance with one embodiment of the invention includes the operations of: classifying a plurality of operations based on the execution ports usable to execute those operations; allocating the plurality of operations into groups within a reservation station based on the classification, wherein each group is serviced by one or more execution ports corresponding to the classification, and wherein two or more entries within a group share a common read port and a common write port; dynamically scheduling two or more operations in a group for concurrent execution based on the ports capable of executing those operations and a relative age of the operations.
Abstract:
A processor and method are described for scheduling operations for execution within a reservation station. For example, a method in accordance with one embodiment of the invention includes the operations of: classifying a plurality of operations based on the execution ports usable to execute those operations; allocating the plurality of operations into groups within a reservation station based on the classification, wherein each group is serviced by one or more execution ports corresponding to the classification, and wherein two or more entries within a group share a common read port and a common write port; dynamically scheduling two or more operations in a group for concurrent execution based on the ports capable of executing those operations and a relative age of the operations.
Abstract:
In one embodiment, an apparatus includes: a plurality of registers; a first instruction queue to store first instructions; a second instruction queue to store second instructions; a program order queue having a plurality of portions each associated with one of the plurality of registers, each of the portions having entries to store a state of an instruction, the state comprising an encoding of a use of the register by the instruction and a source instruction queue for the instruction; and a dispatcher to dispatch for execution the first and second instructions from the first and second instruction queues based at least in part on information stored in the program order queue, to manage instruction dependencies between the first instructions and the second instructions. Other embodiments are described and claimed.
Abstract:
In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.
Abstract:
In one embodiment, an apparatus includes: a plurality of registers; a first instruction queue to store first instructions; a second instruction queue to store second instructions; a program order queue having a plurality of portions each associated with one of the plurality of registers, each of the portions having entries to store a state of an instruction, the state comprising an encoding of a use of the register by the instruction and a source instruction queue for the instruction; and a dispatcher to dispatch for execution the first and second instructions from the first and second instruction queues based at least in part on information stored in the program order queue, to manage instruction dependencies between the first instructions and the second instructions. Other embodiments are described and claimed.
Abstract:
Devices for computing the sum of multiple Vector-Vector Dot-Products (VVDP) or multiple partial sums of VVDP can include a resistive memory array and a reduction circuit. The reduction circuit can be configured to determine a sum of a selected one or more of a plurality of bit lines of the resistive memory array. A VVDP reduction can be determined from the sum of the selected one or more of the plurality of bit lines.
Abstract:
A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.