摘要:
A method and system of expediting issuance of a second request of a pair of ordered requests into a distributed coherent communication fabric. The first request of the ordered pair is issued into the coherent communication fabric and directed to a first target. Issuance of the second request into the coherent communication fabric is stalled until the first target receives and orders the first request and transmits a response acknowledging the same.
摘要:
A messaging scheme that conserves system memory bandwidth during a memory read operation in a multiprocessing computer system is described. A source processing node sends a memory read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. The target node transmits a read response to the source node containing the requested data and also concurrently transmits a probe command to one or more of the remaining nodes in the multiprocessing computer system. In response to the probe command each remaining processing node checks whether the processing node has a cached copy of the requested data. If a processing node, other than the source and the target nodes, finds a modified cached copy of the designated memory location, that processing node responds with a memory cancel response sent to the target node and a read response sent to the source node. The read response contains the modified cache block containing the requested data, and the memory cancel response causes the target node to abort further processing of the memory read command, and to stop transmission of the read response, if the target node hasn't transmitted the read response yet. The memory cancel message thus attempts to avoid relatively lengthy and time-consuming system memory accesses when the system memory has a stale data.
摘要:
A computer system includes an external unit governing a cache which generates a set-dirty request as a function of a coherence state of a block in the cache to be modified. The external unit modifies the block of the cache only if an acknowledgment granting permission is received from a memory management system responsive to the set-dirty request. The memory management system receives the set-dirty request, determines the acknowledgment based on contents of the plurality of caches and the main memory according to a cache protocol and sends the acknowledgment to the external unit in response to the set-dirty request. The acknowledgment will either grant permission or deny permission to set the block to the dirty state.
摘要:
A line predictor caches alignment information for instructions. In response to each fetch address, the line predictor provides alignment information for the instruction beginning at the fetch address, as well as one or more additional instructions subsequent to that instruction. The line predictor may include a memory having multiple entries, each entry storing up to a predefined maximum number of instruction pointers and a fetch address corresponding to the instruction identified by a first one of the instruction pointers. Additionally, each entry may include a link to another entry storing instruction pointers to the next instructions within the predicted instruction stream, and a next fetch address corresponding to the first instruction within the next entry. The next fetch address may be provided to the instruction cache to fetch the corresponding instruction bytes. If the terminating instruction within the entry is a branch instruction, the line predictor is trained with respect to the next fetch address (and next index within the line predictor, which provides the link to the next entry). As line predictor entries are created, a set of branch predictors may be accessed to provide an initial next fetch address and index. The initial training is verified by accessing the branch predictors at each fetch of the line predictor entry, and updated as dictated by the state of the branch predictors at each fetch.
摘要:
A scheduler issues instruction operations for execution, but also retains the instruction operations. If a particular instruction operation is subsequently found to be required to execute non-speculatively, the particular instruction operation is still stored in the scheduler. Subsequent to determining that the particular operation has become non-speculative (through the issuance and execution of instruction operations prior to the particular instruction operation), the particular instruction operation may be reissued from the scheduler. The penalty for incorrect scheduling of instruction operations which are to execute non-speculatively may be reduced as compared to purging the particular instruction operation and younger instruction operations from the pipeline and refetching the particular instruction operation. Additionally, the scheduler may maintain the dependency indications for each instruction operation which has been issued. If the particular instruction operation is reissued, the instruction operations which are dependent on the particular instruction operation (directly or indirectly) may be identified via the dependency indications. The scheduler reissues the dependent instruction operations as well. Instruction operations which are subsequent to the particular instruction operation in program order but which are not dependent on the particular instruction operation are not reissued. Accordingly, the penalty for incorrect scheduling of instruction operations which are to be executed non-speculatively may be further decreased over the purging of the particular instruction and all younger instruction operations and refetching the particular instruction operation.
摘要:
A computer system is presented implementing a system and method for properly ordering write operations. The system and method for properly ordering write operations aids in maintaining memory coherency within the computer system. The computer system includes multiple interconnected processing nodes. One or more of the processing nodes includes a central processing unit (CPU) and/or a cache memory, and one or more of the processing nodes includes a memory controller coupled to a memory. The CPU or cache generates a write command to store data within the memory. The memory controller receives the write command and responds to the write command by issuing a target done response to the CPU or cache after the memory controller: (i) properly orders the write command within the memory controller with respect to other commands pending within the memory controller, and (ii) determines that a coherency state with respect to the write command has been established within the computer system.
摘要:
A computer system employs a distributed set of links between processing nodes (each processing node including at least one processor). Each link includes a clock signal which is transmitted with and in the same direction as the signals carrying information on the link. The line carrying the clock signal may be matched to the information lines, controlling skew and transport time differences to allow for high frequency operation. Because the clock signals at a transmitter and a receiver may not have a common source, a receive buffer may be employed. Data transmitted across the link is stored into the receive buffer responsive to the transmitter clock signal (e.g. by maintaining a load pointer controlled according to the transmitter clock), and is removed from the buffer responsive to the receiver clock signal (e.g. by maintaining an unload pointer controlled according to the receiver clock). The buffer includes sufficient entries for data to account for clock uncertainties (e.g. skew and jitter). Additionally, the receiver includes unload pointer adjust logic which monitors the transmitter clock signal and the receiver clock signal for differences (e.g. differences in frequency). The unload pointer adjust logic makes adjustments to the unload pointer to account for the differences in the clock signal, and hence to maintain integrity of the data transmitted by preventing the load and unload pointers from overrunning each other in the buffer.
摘要:
A multiprocessor system includes a plurality of processors, each processor having one or more caches local to the processor, and a memory controller connectable to the plurality of processors and a main memory. The memory controller manages the caches and the main memory of the multiprocessor system. A processor of the multiprocessor system is configurable to evict from its cache a block of data. The selected block may have a clean coherence state or a dirty coherence state. The processor communicates a notify signal indicating eviction of the selected block to the memory controller. In addition to sending a write victim notify signal if the selected block has a dirty coherence state, the processor sends a clean victim notify signal if the selected block has a clean coherence state.
摘要:
A messaging scheme that conserves system memory bandwidth during a memory read operation in a multiprocessing computer system is described. A source processing node sends a memory read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. The target node transmits a read response to the source node containing the requested data and also concurrently transmits a probe command to one or more of the remaining nodes in the multiprocessing computer system. In response to the probe command each remaining processing node checks whether the processing node has a cached copy of the requested data. If a processing node, other than the source and the target nodes, finds a modified cached copy of the designated memory location, that processing node responds with a memory cancel response sent to the target node and a read response sent to the source node. The read response contains the modified cache block containing the requested data, and the memory cancel response causes the target node to abort further processing of the memory read command, and to stop transmission of the read response, if the target node hasn't transmitted the read response yet. The memory cancel message thus attempts to avoid relatively lengthy and time-consuming system memory accesses when the system memory has a stale data.
摘要:
A processor of a multiprocessor system is configured to transmit a full probe to a cache associated with the processor to transfer data from the stored data of the cache. The data corresponding to the full probe is transferred during a time period. A first tag-only probe is also transmitted to the cache during the same time period to determine if the data corresponding to the tag-only probe is part of the stored data stored in the cache. A stream of probes accesses the cache in two stages. The cache is composed of a tag structure and a data structure. In the first stage, a probe is designated a tag-only probe and accesses the tag structure, but not the data structure, to determine tag information indicating a hit or a miss. In the second stage, if the probe returns tag information indicating a cache hit the probe is designated to be a full probe and accesses the data structure of the cache. If the probe returns tag information indicating a cache miss the probe does not proceed to the second stage.