摘要:
A processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC. A second table maintains a corresponding portion of the store PCs of recently dispatched stores, along with tags identifying the recently dispatched stores. In another implementation, the STLF predictor records a difference between the tags assigned to a load and a store which interferes with the load in a first table indexed by the load PC. The PC of the dispatching load is used to select a difference from the table, and the difference is added to the tag assigned to the load.
摘要:
A computer system is presented which implements a system and method for ordering input/output (I/O) memory operations. In one embodiment, the computer system includes a processing subsystem and an I/O subsystem. The processing subsystem includes multiple processing nodes interconnected via coherent communication links. Each processing node may include a processor executing software instructions. The I/O subsystem includes one or more I/O nodes serially coupled via non-coherent communication links. Each I/O node may embody one or more I/O functions (e.g., modem, sound card, etc.). One of the processing nodes includes a host bridge which translates packets moving between the processing subsystem and the I/O subsystem. One of the I/O nodes is coupled to the processing node including the host bridges. The I/O node coupled to the processing node produces and/or provides transactions having destinations or targets within the processing subsystem to the processing node including the host bridge. The I/O node may, for example, produce and/or provide a first transaction followed by a second transaction. The host bridge may dispatch the second transaction with respect to the first transaction according to a predetermined set of ordering rules. For example, the host bridge may: (i) receive the first and second transactions, (ii) dispatch the first transaction within the processing subsystem, and (iii) dispatch the second transaction within the processing subsystem dependent upon progress of the first transaction within the processing subsystem and the predetermined set of ordering rules.
摘要:
A computer system is presented which implements a “flush” operation providing a response to a source which signifies that all posted write operations previously issued by the source have been properly ordered within their targets with respect to other pending operations. The computer system includes multiple processing nodes within a processing subsystem and at least one input/output (I/O) node coupled to a processing node including a host bridge. The host bridge receives non-coherent posted write commands from the I/O node and responsively generates corresponding coherent posted write commands within the processing subsystem. Each posted write command has a target within the processing subsystem. The host bridge includes a data buffer for storing data used to track the status of non-coherent posted write commands. The I/O node issues a flush command to ensure that all previously issued non-coherent posted write commands have at least reached points of coherency within the processing subsystem. The host bridge issues a non-coherent target done response to the I/O node in response to: (i) the flush command, and (ii) coherent target done responses received from all targets of posted write commands previously issued by the I/O node. Coherent target done responses signify write commands have at least reached points of coherency within the processing subsystem. The non-coherent target done response signals the I/O node that all non-coherent posted write commands previously issued by the I/O node have at least reached points of coherency within the processing subsystem.
摘要:
According to the present invention a cache within a multiprocessor system is speculatively filled. To speculatively fill a designated cache, the present invention first determines an address which identifies information located in a main memory. The address may also identify one or more other versions of the information located in one or more caches. The process of filling the designated cache with the information is started by locating the information in the main memory and locating other versions of the information identified by the address in the caches. The validity of the information located in the main memory is determined after locating the other versions of the information. The process of filling the designated cache with the information located in the main memory is initiated before determining the validity of the information located in main memory. Thus, the memory reference is speculative.
摘要:
An architecture which splits primary and secondary cache memory buses and maintains cache hierarchy consistency without performing an explicit invalidation of the secondary cache tag. Two explicit rules are used to determine the status of a block read from the primary cache. In particular, if any memory reference subset matches a block in the primary cache, the associated secondary cache block is ignored. Secondly, if any memory reference subset matches a block in the miss address file, the associated secondary cache block is ignored. Therefore, any further references which subset match the first reference are not allowed to proceed until the fill back to main memory has been completed and the associated miss address file entry has been retired. This ensures that no agent in the host processor or an external agent can illegally use the stale secondary cache data.
摘要:
A memory management system couples processors to each other and to a main memory. Each processor may have one or more associated caches local to that processor. A system port of the memory management system receives a request from a source processor of the processors to access a block of data from the main memory. A memory manager of the memory management system then converts the request into a probe command having a data movement part identifying a condition for movement of the block out of a cache of a target processor and a next coherence state part indicating a next state of the block in the cache of the target processor.
摘要:
A register renaming apparatus includes one or more physical registers which may be assigned to store a floating point value, a multimedia value, an integer value and corresponding condition codes, or condition codes only. The classification of the instruction (e.g. floating point, multimedia, integer, flags-only) defines which lookahead register state is updated (e.g. floating point, integer, flags, etc.), but the physical register can be selected from the one or more physical registers for any of the instruction types. Determining if enough physical registers are free for assignment to the instructions being selected for dispatch includes considering the number of instructions selected for dispatch and the number of free physical registers, but excludes the data type of the instruction. When a code sequence includes predominately instructions of a particular data type, many of the physical registers may be assigned to that data type (efficiently using the physical register resource). By contrast, if different sets of physical registers are provided for different data types, only the physical registers used for the particular data type may be used for the aforementioned code sequence. Additional efficiencies may be realized in embodiments in which an integer register and condition codes are both updated by many instructions. One physical register may concurrently represent the architected state of both the flags register and the integer register. Accordingly, a given functional unit may forward a single physical register number for both results.
摘要:
A circuit and method is disclosed for preserving the order for memory requests originating from I/O devices coupled to a multiprocessor computer system. The multiprocessor computer system includes a plurality of circuit nodes and a plurality of memories. Each circuit node includes at least one microprocessor coupled to a memory controller which in turn is coupled to one of the plurality of memories. The circuit nodes are in data communication with each other, each circuit node being uniquely identified by a node number. At least one of the circuit nodes is coupled to an I/O bridge which in turn is coupled directly or indirectly to one or more I/O devices. The I/O bridge generates non-coherent memory access transactions in response to memory access requests originating with one of the I/O devices. The circuit node coupled to the I/O bridge, receives the non-coherent memory access transactions. For example, the circuit node coupled to the I/O bridge receives first and second non-coherent memory access transactions. The first and second non-coherent memory access transactions include first and second memory addresses, respectively. The first and second non-coherent memory access transactions further include first and second pipe identifications, respectively. The node circuit maps the first and second memory addresses to first and second node numbers, respectively. The first and second pipe identifications are compared. If the first and second pipe identifications compare equally, then the first and second node numbers are compared. First and second coherent memory access transactions are generated by the node coupled to the I/O bridge wherein the first and second coherent memory access transactions correspond to the first and second non-coherent memory access transactions, respectively. The first coherent memory access transaction is transmitted to one of the nodes of the multiprocessor computer system. However, the second coherent memory access transaction is not transmitted unless the first and second pipe identifications do not compare equally or if the first and second node numbers compare equally.
摘要:
The invention is directed to a method and circuit for performing an addition operation in successive pipelined instructions which utilize a sliced ALU. Successive microinstructions are monitored to determine if both microinstructions are add operations. Further, it is determined whether the use of the destination of the first microinstruction is a source for the add operation in the second microinstruction. If both microinstructions are add operations and the destination of the first microinstruction is used as the source for the second microinstruction and one of the addends of the second microinstruction is a small addend then the circuit detects whether a carry-out occurred in the least significant slice of the second instruction. If there is no carry-out, the result for the more significant slice of the second microinstruction answer. However, if a carry-out was detected, then the result for the second microinstruction's more significant slice is the sum+1 of the second microinstruction.
摘要:
In a data processing system in which access to a second unit by a first unit through a system bus is determined by an arbitration unit, when a requesting unit that receives access to the system bus is unable to use that access for interaction with the second unit, a busy signal is provided to the arbitration unit and to the units. The busy signal causes the units to reinstitute a request for access to the system bus when the subsystem had an aborted transaction. The busy signal enforces a delay in the next arbitration for the system bus until a unit, with an aborted transaction as a result of the busy signal, can reassert the request for access signal. Moreover, apparatus can be included with the arbitration unit that permits rearbitrating access to the bus using the priority conditions in effect at the time of the original arbitration.