摘要:
A circuit and method is provided for selectively stalling interrupt requests originating devices coupled to a multiprocessor system. The multiprocessor system includes a plurality of circuit nodes each one of which is coupled to an individual memory. An I/O bridge coupled to a first circuit node is configured to generate non-coherent memory access command packets and non-coherent interrupt command packets. The first circuit node also generates a coherent interrupt command packet in response to receiving the non-coherent interrupt command packet. The first circuit node transmits the coherent interrupt command packet to another circuit node, possibly the second circuit node. However, the transmission of the coherent interrupt command packet may be delayed. Any delay in transmission is based on a comparison of the pipe identifications of the non-coherent command packets.
摘要:
A bypass mechanism is disclosed for a computer system that executes load and store instructions out of order. The bypass mechanism compares the address of each issuing load instruction with a set of recent store instructions that have not yet updated memory. A match of the recent stores provides the load data instead of having to retrieve the data from memory. A store queue holds the recently issued stores. Each store queue entry and the issuing load includes a data size indicator. Subsequent to a data bypass, the data size indicator of the issuing load is compared against the data size indicator of the matching store queue entry. A trap is signaled when the data size indicator of the issuing load differs from the data size indicator of the matching store queue entry. The trap signal indicates that the data provided by the bypass mechanism was insufficient to satisfy the requirements of the load instruction. The bypass mechanism also operates in cases in which multiple prior stores to the same address are pending when a load that needs to read that address issues.
摘要:
A data caching system comprises a hashing function, a data store, a tag array, a page translator, a comparator and a duplicate tag array. The hashing function combines an index portion of a virtual address with a virtual page portion of the virtual address to form a cache index. The data store comprises a plurality of data blocks for holding data. The tag array comprises a plurality of tag entries corresponding to the data blocks, and both the data store and tag array are addressed with the cache index. The tag array provides a plurality of physical address tags corresponding to physical addresses of data resident within corresponding data blocks in the data store addressed by the cache index. The page translator translates a tag portion of the virtual address to a corresponding physical address tag. The comparator verifies a match between the physical address tag from the page translator and the plurality of physical address tags from the tag array, a match indicating that data addressed by the virtual address is resident within the data store. Finally, the duplicate tag array resolves synonym issues caused by hashing. The hashing function is such that addresses which are equivalent mod 213 are pseudo-randomly displaced within the cache. The preferred hashing function maps VA to bits of the cache index.
摘要:
A technique for verification of a complex integrated circuit design, such as a microprocessor, using a randomly generated test program to simulate internal events and to determine the timing of external events. The simulation proceeds in two passes. During a first pass, the randomly generated test program and data vectors are applied to a simulation model of the design being verified. During this first pass, an internal agent collects profile data about internal events such as addresses and program counter contents as they occur. During a second pass of the process, the profile data is used to generate directed external events based upon the data observed during the first pass. In this manner, the advantages of rapid test vector generation provided through random schemes is achieved at the same time that a more directed external event correlation is accomplished.
摘要:
In a data processing system in which a plurality of data processing units or subsystems exchange logic signal groups by means of a system bus, apparatus is provided to allow sufficient time to permit transients on the system bus to decay, thereby increasing the integrity of the data. When the logic signal groups are applied to the system bus via conducting and nonconducting transistors, the presence of a logic signal on the system bus immediately prior to the application of a set of logic signals from a different data processing unit can delay the on-set of conduction of the most recently activated transistors, thereby resulting in transients of long duration. To accommodate these long transient conditions, the application of the new set of logic signals can be delayed until the transients on the system bus have been attenuated. Apparatus is disclosed for prohibiting access to the system bus by any subsystem during the system clock cycle following a subsystem access or by preventing access to the system bus by subsystems determined by the subsystem having access during the prior system clock cycle.
摘要:
In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, a branch direction predictor may be updated responsive to a misprediction and also responsive to the branch prediction being within a threshold of transitioning between predictions. To avoid a lookup to determine if the threshold update is to be performed, the branch predictor may detect the threshold update during prediction, and may transmit an indication with the branch.
摘要:
In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset.
摘要:
In one embodiment, an apparatus comprises a first clocked storage device operable in a first clock domain corresponding to a first clock signal. The first clocked storage device has an input coupled to receive one or more bits transmitted on the input from a second clock domain corresponding to a second clock signal. The apparatus further comprises control circuitry configured to ensure that a change in a value of the one or more bits transmitted on the input meets setup and hold time requirements of the first clocked storage device. The control circuitry is responsive to a sample history of one of the first clock signal or the second clock signal to detect a phase relationship between the first clock signal and the second clock signal on each clock cycle to ensure the change meets the setup and hold time requirements.
摘要:
In one embodiment, a processor comprises a data cache configured to store a plurality of cache blocks and a control unit coupled to the data cache. The control unit is configured to flush the plurality of cache blocks from the data cache responsive to an indication that the processor is to transition to a low power state in which one or more clocks for the processor are inhibited.
摘要:
In one embodiment, a switch is configured to be coupled to an interconnect. The switch comprises a plurality of storage locations and an arbiter control circuit coupled to the plurality of storage locations. The plurality of storage locations are configured to store a plurality of requests transmitted by a plurality of agents. The arbiter control circuit is configured to arbitrate among the plurality of requests stored in the plurality of storage locations. A selected request is the winner of the arbitration, and the switch is configured to transmit the selected request from one of the plurality of storage locations onto the interconnect. In another embodiment, a system comprises a plurality of agents, an interconnect, and the switch coupled to the plurality of agents and the interconnect. In another embodiment, a method is contemplated.