摘要:
In a multiprocessing computer system, a cache-coherent data transfer scheme that also conserves the system memory bandwidth during a memory read operation is described. A source processing node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. In response to the read command, the target processing node transmits a probe command to all the remaining processing nodes in the computer system regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Probe command causes each node to maintain cache coherency by appropriately changing the state of the cache block containing the requested data and sending respective probe responses to the source node. Probe command also causes the node having an updated copy of the cache block to send the cache block to the source node through a read response. The target node, concurrently with the probe command, initiates a read response transmission to send the requested data to the source node. The node having the modified cached copy containing the requested data transmits a memory cancel response to the target node concurrently with the updated copy of the cache block to the source node. The memory cancel response attempts to prevent the target node from sending to the source node the stale data from the system memory. The memory cancel response also causes the target node to send a target done response to the source node. The source node waits for probe responses, read responses and the target done message prior to sending a source done message to the target node.
摘要:
A computing apparatus has a mode selector configured to select one of a long-bus mode corresponding to a first memory size and a short-bus mode corresponding to a second memory size which is less than the first memory size. An address bus of the computing apparatus is configured to transmit an address consisting of address bits defining the first memory size and a subset of the address bits defining the second memory size. The address bus has N communication lines each configured to transmit one of a first number of bits of the address bits defining the first memory size in the long-bus mode and M of the N communication lines each configured to transmit one of a second number of bits of the address bits defining the second memory size in the short-bus mode, where M is less than N.
摘要:
A cache memory unit is disclosed in which, in response to the application of a write command, the write operation is performed in two system clock cycles. During the first clock cycle, the data signal group is stored in a temporary storage unit while a determination is made if the address signal group associated with the data signal group is present in the cache memory unit. When the address signal group is present, the data signal group is stored in the cache memory unit during the next application of a write command to the cache memory unit. If a read command is applied to the cache memory unit involving the data signal group stored in the temporary storage unit, then this data signal group is transferred to the central processing unit in response to the read command. Instead of performing the storage into the cache memory unit as a result of the next write command, the storage of the data signal in the cache memory unit can occur during any free cycle.
摘要:
A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
摘要:
A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.
摘要:
In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, branch prediction values from multiple entries in each table may be read and respective branch prediction values may be combined to form branch predictions for up to M branches in the fetch group.
摘要:
An interface unit may comprise a buffer configured to store requests that are to be transmitted on an interconnect and a control unit coupled to the buffer. In one embodiment, the control unit is coupled to receive a retry response from the interconnect during a response phase of a first transaction for a first request stored in the buffer. The control unit is configured to record an identifier supplied on the interconnect with the retry response that identifies a second transaction that is in progress on the interconnect. The control unit is configured to inhibit reinitiation of the first transaction at least until detecting a second transmission of the identifier. In another embodiment, the control unit is configured to assert a retry response during a response phase of a first transaction responsive to a snoop hit of the first transaction on a first request stored in the buffer for which a second transaction is in progress on the interconnect. The control unit is further configured to provide an identifier of the second transaction with the retry response.
摘要:
In an embodiment, a non-transparent memory unit is provided which includes a non-transparent memory and a control circuit. The control circuit may manage the non-transparent memory as a set of non-transparent memory blocks. Software executing on one or more processors may request a non-transparent memory block in which to process data. The control circuit may allocate a first block, and may return an address (or other indication) of the allocated block so that the software can access the block. The control circuit may also provide automatic data movement between the non-transparent memory and a main memory system to which the non-transparent memory unit is coupled. For example, the automatic data movement may include filling data from the main memory system to the allocated block, or flushing the data in the allocated block to the main memory system after the processing of the allocated block is complete.
摘要:
In one embodiment, a processor comprises a data cache configured to store a plurality of cache blocks and a control unit coupled to the data cache. The control unit is configured to flush the plurality of cache blocks from the data cache responsive to an indication that the processor is to transition to a low power state in which one or more clocks for the processor are inhibited.
摘要:
In one embodiment, an apparatus comprises an interconnect; at least one processor coupled to the interconnect; and at least one memory controller coupled to the interconnect. The memory controller is programmable by the processor into a loopback test mode of operation and, in the loopback test mode, the memory controller is configured to receive a first write operation from the processor over the interconnect. The memory controller is configured to route write data from the first write operation through a plurality of drivers and receivers connected to a plurality of data pins that are capable of connection to one or more memory modules. The memory controller is further configured to return the write data as read data on the interconnect for a first read operation received from the processor on the interconnect.