摘要:
A computer system has a communication link that includes a control signal and data lines. A first control packet having a-plurality of bytes is transferred over the data lines from a first to a second node on the communication link. The control line is asserted to indicate transfer of a control packet. After transfer of the first control packet, a first portion of a multi-byte data packet associated with the first control packet is transferred with the control line deasserted. During transfer of the data packet the control line is asserted and transfer of the data packet is suspended. A second control packet is then transferred over the data lines. Subsequent to transferring the second control packet, the remainder of the data packet is transferred with the control line deasserted.
摘要:
A computer system is presented which implements a system and method for conveying packets between a coherent processing subsystem and a non-coherent input/output (I/O) subsystem. The processing subsystem includes a first processing node coupled to a second processing node via a coherent communication link. The first processing node includes a host bridge which translates packets moving between the processing subsystem and the I/O subsystem. The I/O subsystem includes an I/O node coupled to the first processing node via a non-coherent communication link. The I/O node may embody one or more I/O functions (e.g., modem, sound card, etc.). The coherent and non-coherent communication links are physically identical. For example, the coherent and non-coherent communication links may have the same electrical interface and the same signal definition. The host bridge translates non-coherent packets from the I/O node to coherent packets, and transmits the coherent packets to the second processing node. The host bridge also translates coherent packets from the second processing node to non-coherent packets, and transmits the non-coherent packets to the I/O node. The coherent and non-coherent packets have identically located command fields. The translating process includes copying the contents of the command field of one packet type to the command field of the other packet type.
摘要:
An adaptive retry mechanism may record latencies of recent transactions (e.g. the first data transfer latency), and may select a retry latency from two or more retry latencies. The retry latency may be used for a transaction, and may specify a point in time during the transaction at which the transaction is retried if the first data transfer has not yet occurred. In one implementation, the set of retry latencies includes a minimum retry latency, a nominal retry latency, and a maximum retry latency. The nominal retry latency may be set slightly greater than the expected latency of transactions in the system. The minimum retry latency may be less than the nominal retry latency and the maximum retry latency may be greater than the nominal retry latency. If latencies greater than the nominal retry latency but less than the maximum retry latency are being experienced, the maximum retry latency may be selected. On the other hand, if latencies greater than the maximum retry latency are being experienced, the minimum retry latency may be selected.
摘要:
A computer system may include multiple processing nodes, one or more of which may be coupled to separate memories which may form a distributed memory system. The processing nodes may include caches, and the computer system may maintain coherency between the caches and the distributed memory system. Particularly, the computer system may implement a flexible probe command/response routing scheme. The scheme may employ an indication within the probe command which identifies a receiving node to receive the probe responses. For example, probe commands indicating that the target or the source of transaction should receive probe responses corresponding to the transaction may be included. Probe commands may specify the source of the transaction as the receiving node for read transactions (such that dirty data is delivered to the source node from the node storing the dirty data). On the other hand, for write transactions (in which data is being updated in memory at the target node of the transaction), the probe commands may specify the target of the transaction as the receiving node. In this manner, the target may determine when to commit the write data to memory and may receive any dirty data to be merged with the write data.
摘要:
A messaging scheme that accomplishes cache-coherent data transfers during a memory read operation in a multiprocessing computer system is described. A source processing node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. In response to the read command, the target processing node transmits a probe command to all the remaining processing nodes in the computer system regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Probe command causes each node to maintain cache coherency by appropriately changing the state of the cache block containing the requested data and by causing the node having an updated copy of the cache block to send the cache block to the source node. Each processing node that receives a probe command sends, in return, a probe response indicating whether that processing node has a cached copy of the data and the state of the cached copy if the responding node has the cached copy. The target node sends a read response including the requested data to the source node. The source node waits for responses from the target node and from each of the remaining node in the system and acknowledges the receipt of requested data by sending a source done response to the target node.
摘要:
A messaging scheme to synchronize processes within a distributed memory multiprocessing computer system having two or more processing nodes interconnected using an interconnect structure of dual-unidirectional links. The microcode within the lock requesting node transmits a write command to write corresponding node identification data into a lock register in the arbitrating node. The lock requesting node iteratively reads the lock register until it finds its node identification data stored therein with a valid bit set. The lock requesting node then informs all remaining processing nodes to release shared system resources. This is accomplished through a release request bit and a release response bit in each processing node. After completion of lock operations, the lock requesting node sends a message to the arbitrating node to reset the valid bit in the lock register, and a broadcast message to each remaining node to reset the release request bit. In an alternate embodiment, each processing node includes a lock resource register instead of a release response bit. Each remaining processing node writes its node identification data into the lock resource register within the lock requesting processing node when ready to release shared system resources to allow the lock requesting node to establish lock ownership. This informs the lock requesting node to commence lock operations. Lock operations are performed without contention for system resources and without deadlocks among various processing nodes.
摘要:
A register renaming apparatus includes one or more physical registers which may be assigned to store a floating point value, a multimedia value, an integer value and corresponding condition codes, or condition codes only. The classification of the instruction (e.g. floating point, multimedia, integer, flags-only) defines which lookahead register state is updated (e.g. floating point, integer, flags, etc.), but the physical register can be selected from the one or more physical registers for any of the instruction types. Determining if enough physical registers are free for assignment to the instructions being selected for dispatch includes considering the number of instructions selected for dispatch and the number of free physical registers, but excludes the data type of the instruction. When a code sequence includes predominately instructions of a particular data type, many of the physical registers may be assigned to that data type (efficiently using the physical register resource). By contrast, if different sets of physical registers are provided for different data types, only the physical registers used for the particular data type may be used for the aforementioned code sequence. Additional efficiencies may be realized in embodiments in which an integer register and condition codes are both updated by many instructions. One physical register may concurrently represent the architected state of both the flags register and the integer register. Accordingly, a given functional unit may forward a single physical register number for both results.
摘要:
A data caching system and method includes a data store for caching data from a main memory, a primary tag array for holding tags associated with data cached in the data store, and a duplicate tag array which holds copies of the tags held in the primary tag array. The duplicate tag array is accessible by functions, such as external memory cache probes, such that the primary tag remains available to the processor core. An address translator maps virtual page addresses to physical page address. In order to allow a data caching system which is larger than a page size, a portion of the virtual page address is used to index the tag arrays and data store. However, because of the virtual to physical mapping, the data may reside in any of a number of physical locations. During an internally-generated memory access, the virtual address is used to look up the cache. If there is a miss, other combinations of values are substituted for the virtual bits of the tag array index. For external probes which provide physical addresses to the duplicate tag array, combinations of values are appended to the index portion of the physical address. Tag array lookups can be performed either sequentially, or in parallel.
摘要:
A computer method and apparatus causes the load-store instruction grouping in a microprocessor instruction pipeline to be disrupted at appropriate times. The computer method and apparatus employs a memory access member which periodically stalls the issuance of store instructions when there are prior store instructions pending in the store queue. The periodic stalls bias the issue stage to issue load groups and store instruction groups. In the latter case, the store queue is free to update the data cache with the data from previous store instructions. Thus, the invention memory access member biases issuance of store instructions in a manner that prevents the store queue from becoming full, and as such enables the store queue to write to the data cache before the store queue becomes full.
摘要:
A system and method for controlling a shared memory bus in a computer of a multi-processor system prevents collisions on the shared bus and ensures that the bus is full at system start-up. Steady state operations are maintained without the need for a queuing mechanism in the system's memory controller and in view of the memory modules of the shared memory having different read access times, with the system and method being implemented in a system that includes a central unit and multiple uni-directional buses that are disposed between a shared memory and a plurality of processors, with the central unit controlling access to, and use of, the shared buses of the system.