摘要:
A computer system employs virtual channels and allocates different resources to the virtual channels. Packets which do not have logical/protocol-related conflicts are grouped into a virtual channel. Accordingly, logical conflicts occur between packets in separate virtual channels. The packets within a virtual channel may share resources (and hence experience resource conflicts), but the packets within different virtual channels may not share resources. Since packets which may experience resource conflicts do not experience logical conflicts, and since packets which may experience logical conflicts do not experience resource conflicts, deadlock-free operation may be achieved. Additionally, nodes within the computer system may be configured to preallocate resources to process response packets. Some response packets may have logical conflicts with other response packets, and hence would normally not be allocable to the same virtual channel. However, by preallocating response-processing resources, response packets are accepted by the destination node. Thus, any resource conflicts which may occur are temporary (as the response packets which make forward progress are processable). Viewed in another way, response packets may be logically independent if the destination node is capable of processing the response packets upon receipt. Accordingly, a response virtual channel is formed to which each response packet belongs.
摘要:
A memory controller provides programmable flexibility, via one or more configuration registers, for the configuration of the memory. The memory may be optimized for a given application by programming the configuration registers. For example, in one embodiment, the portion of the address of a memory transaction used to select a storage location for access in response to the memory transaction may be programmable. In an implementation designed for DRAM, a first portion may be programmably selected to form the row address and a second portion may be programmable selected to form the column address. Additional embodiments may further include programmable selection of the portion of the address used to select a bank. Still further, interleave modes among memory sections assigned to different chip selects and among two or more channels to memory may be programmable, in some implementations. Furthermore, the portion of the address used to select between interleaved memory sections or interleaved channels may be programmable. One particular implementation may include all of the above programmable features, which may provide a high degree of flexibility in optimizing the memory system.
摘要:
A computer system is presented which implements a system and method for tracking the progress of posted write transactions. In one embodiment, the computer system includes a processing subsystem and an input/output (I/O) subsystem. The processing subsystem includes multiple processing nodes interconnected via coherent communication links. Each processing node may include a processor preferably executing software instructions. The I/O subsystem includes one or more I/O nodes. Each I/O node may embody one or more I/O functions (e.g., modem, sound card, etc.). The multiple processing nodes may include a first processing node and a second processing node, wherein the first processing node includes a host bridge, and wherein a memory is coupled to the second processing node. An I/O node may generate a non-coherent write transaction to store data within the second processing node's memory, wherein the non-coherent write transaction is a posted write transaction. The I/O node may dispatch the non-coherent write transaction directed to the host bridge. The host bridge may respond to the non-coherent write transaction by translating the non-coherent write transaction to a coherent write transaction, and dispatching the coherent write transaction to the second processing node. The second processing node may respond to the coherent write transaction by dispatching a target done response directed to the host bridge.
摘要翻译:提出了一种实现用于跟踪已发布的写入事务进度的系统和方法的计算机系统。 在一个实施例中,计算机系统包括处理子系统和输入/输出(I / O)子系统。 处理子系统包括通过相干通信链路互连的多个处理节点。 每个处理节点可以包括优选执行软件指令的处理器。 I / O子系统包括一个或多个I / O节点。 每个I / O节点可以体现一个或多个I / O功能(例如,调制解调器,声卡等)。 多个处理节点可以包括第一处理节点和第二处理节点,其中第一处理节点包括主机桥,并且其中存储器耦合到第二处理节点。 I / O节点可以生成非相干写事务以在第二处理节点的存储器内存储数据,其中非相干写事务是已发布的写事务。 I / O节点可以调度定向到主桥的非相干写入事务。 主桥可以通过将非相干写事务转换为相干写事务来响应非相干写事务,并将相干写事务分派到第二处理节点。 第二处理节点可以通过调度定向到主桥的目标完成响应来响应相干写事务。
摘要:
A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC. A second table maintains a corresponding portion of the store PCs of recently dispatched stores, along with tags identifying the recently dispatched stores. In another implementation, the STLF predictor records a difference between the tags assigned to a load and a store which interferes with the load in a first table indexed by the load PC. The PC of the dispatching load is used to select a difference from the table, and the difference is added to the tag assigned to the load.
摘要:
A cache is coupled to receive an input address and a corresponding way prediction. The cache provides output bytes in response to the predicted way (instead of, performing tag comparisons to select the output bytes). Furthermore, a tag may be read from the predicted way and only partial tags are read from the non-predicted ways. The tag is compared to the tag portion of the input address, and the partial tags are compared to a corresponding partial tag portion of the input address. If the tag matches the tag portion of the input address, a hit in the predicted way is detected and the bytes provided in response to the predicted way are correct. If the tag does not match the tag portion of the input address, a miss in the predicted way is detected. If none of the partial tags match the corresponding partial tag portion of the input address, a miss in the cache is determined. On the other hand, if one or more of the partial tags match the corresponding partial tags portion of the input address, the cache searches the corresponding ways to determine whether or not the input address hits or misses in the cache.
摘要:
A memory controller provides programmable flexibility, via one or more configuration registers, for the configuration of the memory. The memory may be optimized for a given application by programming the configuration registers. For example, in one embodiment, the portion of the address of a memory transaction used to select a storage location for access in response to the memory transaction may be programmable. In an implementation designed for DRAM, a first portion may be programmably selected to form the row address and a second portion may be programmable selected to form the column address. Additional embodiments may further include programmable selection of the portion of the address used to select a bank. Still further, interleave modes among memory sections assigned to different chip selects and among two or more channels to memory may be programmable, in some implementations. Furthermore, the portion of the address used to select between interleaved memory sections or interleaved channels may be programmable. One particular implementation may include all of the above programmable features, which may provide a high degree of flexibility in optimizing the memory system.
摘要:
A processor supports an operating mode in which the default address size is greater than 32 bits and the default operand size is 32 bits. The default address size may be nominally indicated as 64 bits, although various embodiments of the processor may implement any address size which exceeds 32 bits, up to and including 64 bits, in the operating mode. The operating mode may be established by placing an enable indication in a control register into an enabled state and by setting a first operating mode indication and a second operating mode indication in a segment descriptor to predefined states. Additionally, a first instruction prefix may be coded into an instruction to override the default operand size to a first non-default operand size (e.g. 64 bits). Furthermore, a second instruction prefix may be coded into an instruction in addition to the first instruction prefix to override the default operand size to a second non-default operand size (e.g. 16 bits). Thus operand sizes of 64, 32, and 16 bits may be used when desired.
摘要:
A line predictor caches alignment information for instructions. In response to each fetch address, the line predictor provides alignment information for the instruction beginning at the fetch address, as well as one or more additional instructions subsequent to that instruction. The alignment information may be, for example, instruction pointers, each of which directly locates a corresponding instruction within a plurality of instruction bytes fetched in response to the fetch address. The line predictor may include a memory having multiple entries, each entry storing up to a predefined maximum number of instruction pointers and a fetch address corresponding to the instruction identified by a first one of the instruction pointers. Furthermore, each entry may store additional information regarding the terminating instruction within the entry. In one embodiment, the additional information includes an indication of the branch displacement when the terminating instruction is a branch instruction. In another embodiment, the additional information includes the entry point for a microcode instruction when the terminating instruction is a microcode instruction. Furthermore, the microcode instruction may be identified by an instruction pointer corresponding to a particular decode unit which is coupled to the microcode unit.
摘要:
A processor includes an instruction cache and a predecode cache which is not actively maintained coherent with the instruction cache. The processor fetches instruction bytes from the instruction cache and predecode information from the predecode cache. Instructions are provided to a plurality of decode units based on the predecode information, and the decode units decode the instructions and verify that the predecode information corresponds to the instructions. More particularly, each decode unit may verify that a valid instruction was decoded, and that the instruction succeeds a preceding instruction decoded by another decode unit. Additionally, other units involved in the instruction processing pipeline stages prior to decode may verify portions of the predecode information. If the predecode information does not correspond to the fetched instructions, the predecode information may be corrected (either by predecoding the instruction bytes or by updating the predecode information, if the update may be determined without predecoding the instruction bytes). In one particular embodiment, the predecode cache may be a line predictor which stores instruction pointers indexed by a portion of the fetch address. The line predictor may thus experience address aliasing, and predecode information may therefore not correspond to the instruction bytes. However, power may be conserved by not storing and comparing the entire fetch address.
摘要:
A circuit and method is disclosed for preserving the order for memory requests originating from I/O devices coupled to a multiprocessor computer system. The multiprocessor computer system includes a plurality of circuit nodes and a plurality of memories. Each circuit node includes at least one microprocessor coupled to a memory controller which in turn is coupled to one of the plurality of memories. The circuit nodes are in data communication with each other, each circuit node being uniquely identified by a node number. At least one of the circuit nodes is coupled to an I/O bridge which in turn is coupled directly or indirectly to one or more I/O devices. The I/O bridge generates non-coherent memory access transactions in response to memory access requests originating with one of the I/O devices. The circuit node coupled to the I/O bridge, receives the non-coherent memory access transactions. For example, the circuit node coupled to the I/O bridge receives first and second non-coherent memory access transactions. The first and second non-coherent memory access transactions include first and second memory addresses, respectively. The first and second non-coherent memory access transactions further include first and second pipe identifications, respectively. The node circuit maps the first and second memory addresses to first and second node numbers, respectively. The first and second pipe identifications are compared. If the first and second pipe identifications compare equally, then the first and second node numbers are compared. First and second coherent memory access transactions are generated by the node coupled to the I/O bridge wherein the first and second coherent memory access transactions correspond to the first and second non-coherent memory access transactions, respectively. The first coherent memory access transaction is transmitted to one of the nodes of the multiprocessor computer system. However, the second coherent memory access transaction is not transmitted unless the first and second pipe identifications do not compare equally or if the first and second node numbers compare equally.