摘要:
When a processor within a computer system performs a synchronization operation, the system interface within the node delays subsequent transactions from the processor until outstanding coherency activity is completed. Therefore, the computer system may employ asynchronous operations. The synchronization operations may be used when needed to guarantee global completion of one or more prior asynchronous operations. In one embodiment, the synchronization operation is placed into a queue within the system interface. When the synchronization operation reaches the head of the queue, it may be initiated within the system interface. The system interface further includes a request agent comprising multiple control units, each of which may concurrently service coherency activity with respect to a different transaction. Furthermore, the system interface includes a synchronization control vector register which stores a bit for each control unit. Upon initiation of the synchronization operation within the system interface, bits corresponding to those control units which are performing coherency activity (i.e. those which are not idle) are set while other bits are cleared. As each control unit returns to the idle state, the corresponding bit is cleared as well. Once all the bits within the synchronization control vector register are cleared, the coherency activity which was outstanding when the synchronization operation was initiated is complete. The synchronization operation may then be completed.
摘要:
Protocol agents involved in the performance of global coherency activity detect errors with respect to the activity being performed. The errors are logged by a computer system such that diagnostic software may be executed to determine the error detected and to trace the error to the erring software or hardware. In particular, information regarding the first error to be detected is logged. Subsequent errors may receive more or less logging depending upon programmable configuration values. Additionally, those errors which receive full logging may be programmably selected via error masks. The protocol agents each comprise multiple independent state machines which independently process requests. If the request which a particular state machine is processing results in an error, the particular state machine may enter a freeze state. Information regarding the request which is collected by the state machine may thereby be saved for later access. A state machine freezes upon detection of the error if a maximum number of the multiple state machines are not already frozen and the aforementioned error mask indicates that full error logging is employed for the detected error. Therefore, at least a minimum number of the multiple state machines remain functioning even in the presence of a large number of errors. Still further, prior to entering the freeze state, the protocol state machines may transition through a recovery state in which resources not used for error logging purposes are freed from the erring request.
摘要:
A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor. In response to memory access requests from the node processor, the auxiliary processor performs a memory access operation to store data received from the node processor in the memory module, or to retrieve data from the memory module for transfer to the node processor. In response to auxiliary processing instructions from the node processor, the auxiliary processor performs data processing operations in connection with data in the memory module.
摘要:
A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor. In response to memory access requests from the node processor, the auxiliary processor performs a memory access operation to store data received from the node processor in the memory module, or to retrieve data from the memory module for transfer to the node processor. In response to auxiliary processing instructions from the node processor, the auxiliary processor performs data processing operations in connection with data in the memory module.
摘要:
A directory system directs cache line access requests from processors in a multi-processor system with a shared memory system through a cache line states directory. The cache line states directory stores a state value that identifies a cache line shared states word. The cache line shared states word identifies the processor that owns the cache line and the state of access of each processor that shares access to the cache line. A state value encoder encodes a cache line shared state word into a state value and loads the state value into the cache line states directory. A state value decoder decodes the state value into a cache line shared state word for use by the cache line directory system in retrieving the cache line. A plurality of cache line tables are used with each cache line assigned to one of the tables. The cache line table stores a state value for each cache line shared states word stored in the table. The encoder and decoder perform a table look-up to convert between a cache line shared state word and a state value. Each of said cache line tables stores an ordered list of cache line shared state words and their corresponding state values. The ordered list is a list of cache line shared state words that have the most significance to the multi-processor system.
摘要:
A method and apparatus for controlling a DRAM refresh rate. In one embodiment, a computer system includes a memory subsystem having a memory controller and one or more DRAM (dynamic random access memory) devices. The memory controller is configured to periodically initiate a refresh cycle to the one or more DRAM devices. The memory controller is also configured to monitor the temperature of the one or more DRAM devices. If the temperature exceeds a preset threshold, the memory controller is configured to increase the rate at which the periodic refresh cycle is performed.
摘要:
The present invention provides a hybrid Non-Uniform Memory Architecture (NUMA) and Cache-Only Memory Architecture (COMA) caching architecture together with a cache-coherent protocol for a computer system having a plurality of sub-systems coupled to each other via a system interconnect. In one implementation, each sub-system includes at least one processor, a page-oriented COMA cache and a line-oriented hybrid NUMA/COMA cache. Such a hybrid system provides flexibility and efficiency in caching both large and small, and/or sparse and packed data structures. Each sub-system is able to independently store data in COMA mode or in NUMA mode. When caching in COMA mode, a sub-system allocates a page of memory space and then stores the data within the allocated page in its COMA cache. Depending on the implementation, while caching in COMA mode, the sub-system may also store the same data in its hybrid cache for faster access. Conversely, when caching in NUMA mode, the sub-system stores the data, typically a line of data, in its hybrid cache.
摘要:
An apparatus and method for resending a request in a computer system using a delay value is provided. In response to receiving a request, a target device in a computer system may detect that it is temporarily unable to process the request. The target device can send a response to the sending device to indicate that it is temporarily unavailable. The response can include a delay value that can provide a hint to the sending device as to when to resend the request. The target device may generate the delay value according to the type of condition that is causing it to be temporarily unavailable. The delay value may be generated according to a static heuristic or a dynamic algorithm based on previous temporarily unavailable conditions. The delay value may also be used by an error recovery mechanism where a sending device exceeds a retry limit for a particular request.
摘要:
A computer comprising a plurality of processing nodes, a control node and a request distribution network. Each processing node receives processing requests and generates in response processed data. The control node generates processing requests for transfer to selected ones of the processing nodes as identified by associated request address information, and receives processed data in response, the request address information identifying selected ones of the processing nodes to receive a processing request in parallel. The request distribution network distributes the processing requests to the processing nodes and returns processed data to the control node. The network includes a plurality of request distribution nodes connected in a plurality of levels to form a tree-structure, including an upper root level and a lower leaf level. Each request distribution node is connected to receive processing requests from, and to couple processed data to, a parent, the parent of the request distribution node of the root level comprising the control node, and each request distribution node being further connected to couple processing requests to and receive processed data from, selected children, the children of the request distribution nodes of the leaf level comprising the processing nodes. Each request distribution node, in response to request address information received from its parent, identifies selected ones of its children and thereafter couples further request address information which it receives and processing requests in parallel to its children.
摘要:
A method and mechanism for annotating a transaction stream. A processing unit is configured to generate annotation transactions which are inserted into a transaction stream. The transaction stream, including the annotations, are subsequently observed by a trace unit for debug or other analysis. In one embodiment, a processing unit includes a trace address register and an annotation enable bit. The trace address register is configured to store an address corresponding to a trace unit and the enable bit is configured to indicate whether annotation transactions are to be generated. Annotation instructions are added to operating system or user code at locations where annotations are desired. In one embodiment, annotation transactions correspond to transaction types which are not unique to annotation transactions. In one embodiment, an annotation instruction includes a reference to the trace address register which contains the address of the trace unit. Upon detecting the annotation instruction, and detecting annotations are enabled, the processing unit generates an annotation transaction addressed to the trace unit. In one embodiment, annotation transactions may be used to indicate context switches, processor mode changes, timestamps, or address translation information.