摘要:
In one embodiment, a processor comprises a core configured to execute instructions; a register file comprising a plurality of storage locations; and a window management unit. The window management unit is configured to operate the plurality of storage locations as a plurality of windows, wherein register addresses encoded into the instructions identify storage locations among a subset of the plurality of storage locations that are within a current window. Additionally, the window management unit is configured to allocate a second window in response to a predetermined event. One of the current window and the second window serves as a checkpoint of register state, and the other one of the current window and the second window is updated in response to instructions processed subsequent to the checkpoint. The checkpoint may be restored if the speculative execution results are discarded.
摘要:
A system and method for profiling runtime system events of a computer system may include associating a data source type with detected system events. The system events may be detected dependent on information included in a reply message received by a processor in response to a data request or other transaction request message. The reply message may include information characterizing a source type of a source of data included in the reply message. The source type information may indicate that the source is remote or local; that it is a shared or a private storage location; that the data is supplied via a cache-to-cache transfer; or that the data is sourced from a coherency domain other than that of the requesting process. Instructions, events, messages, and replies may be sampled, and extended address information corresponding to the samples may be stored in an event set database for performance analysis.
摘要:
A hardware implemented method for writing data to a cache is provided. In this hardware implemented method, a Block Initializing Store (BIS) instruction is received to write the data from a processor core to a memory block. The BIS instruction includes the data from the processor core. Thereafter, a dummy read request is sent to a memory controller and known data is received from the memory controller without accessing a main memory. The known data is then written to the cache and, after the known data is written, the data from the processor core is written to the cache. A system and processor for writing data to the cache also are described.
摘要:
A high memory capacity dual in-line memory module (DIMM) for use in a directory-based, distributed shared memory multiprocessor computer system including a data memory for storing data and a state memory for storing state or directory information corresponding to at least a portion of the data. The DIMM allows the data and the state information to be accessed independently. The DIMM can be configured in a plurality of storage capacities.
摘要:
A page migration controller is described. The page migration controller determines whether a memory page addressed by a memory access request should be migrated from a local processing node to a requester processing node. The page migration controller accesses an array to obtain a first count associated with the addressed memory page and the requester processing node, and a second count associated with the addressed memory page and the local processing node. The first count is incremented, and then the second count is subtracted from the incremented first count to obtain a difference between the second count and the incremented first count. A comparator determines whether the difference is greater than a migration threshold value. If the difference is greater than the migration threshold value, then a migration interrupt is issued.
摘要:
A multiprocessor system having a plurality of requestors, a memory and memory directory controller employing directory-based coherence. The system implements a method to detect dropping of clean-exclusive data. Only one intervention message is permitted to target an exclusive object held by a first requestor, wherein the intervention message is caused by a second requestor. The system detects whether the first requestor has an outstanding writeback for the object targeted by the intervention message, as well as whether the first requestor has a clean-exclusive, dirty-exclusive or invalid copy of the object targeted by the intervention message. A clean-exclusive copy of the object has been dropped when no outstanding writeback is detected and the first requestor has the object in the invalid state.
摘要:
Software-based cache coherence protocol. A processing unit may execute a memory request using a processor thread. In response to detecting a cache hit to shared or a cache miss associated with the memory request, a cache may provide both a trap signal and coherence information to the processor thread of the processing unit. After receiving the trap signal and the coherence information, the processor thread may perform a cache coherence operation for the memory request using at least the received coherence information. The processing unit may include a plurality of processor threads and a load balancer. The load balancer may receive coherence requests from one or more remote processing units and distribute the received coherence requests across the plurality of processor threads. The load balance may preferentially distribute the received coherence requests across the plurality of processor threads based on the operation state of the processor threads.
摘要:
A memory controller including a control unit for limiting the number of memory requests that are executed within a predetermined time period to regulate power consumption. The control unit may determine a memory request limit indicating the maximum number of memory requests that are allowed to be executed during the predetermined time period based on at least a carry-over limit and a new request limit. The carry-over limit may indicate the maximum number of carry-over memory requests that are allowed during the predetermined time period. The new request limit may indicate the maximum number of new memory requests that are allowed during the predetermined time period. The control unit may further control the number of memory requests that are executed in each of a sequence of predetermined time periods.
摘要:
A high memory capacity DIMM for use in a directory-based, distributed shared memory multiprocessor computer system includes a data memory for storing data and a state memory for storing state or directory information corresponding to at least a portion of the data. The DIMM allows the data and the state information to be accessed independently. The DIMM is configured for use in a DIMM pair. In the DIMM pair, a first DIMM includes a first data memory having first and second memory bank portions for storing data, and a first state memory configured to store state information corresponding to data stored in a first memory bank. A second DIMM includes a second data memory having third and fourth memory bank portions for storing data, and a second state memory configured to store state information corresponding to data stored in a second memory bank. The first memory bank is formed from the first memory bank portion and the third memory bank portion. The second memory bank is formed from the second memory bank portion and the fourth memory bank portion.
摘要:
Software-based cache coherence protocol. A processing unit may execute a memory request using a processor thread. In response to detecting a cache hit to shared or a cache miss associated with the memory request, a cache may provide both a trap signal and coherence information to the processor thread of the processing unit. After receiving the trap signal and the coherence information, the processor thread may perform a cache coherence operation for the memory request using at least the received coherence information. The processing unit may include a plurality of processor threads and a load balancer. The load balancer may receive coherence requests from one or more remote processing units and distribute the received coherence requests across the plurality of processor threads. The load balance may preferentially distribute the received coherence requests across the plurality of processor threads based on the operation state of the processor threads.