摘要:
A circuit and method is provided for selectively stalling interrupt requests originating devices coupled to a multiprocessor system. The multiprocessor system includes a plurality of circuit nodes each one of which is coupled to an individual memory. An I/O bridge coupled to a first circuit node is configured to generate non-coherent memory access command packets and non-coherent interrupt command packets. The first circuit node also generates a coherent interrupt command packet in response to receiving the non-coherent interrupt command packet. The first circuit node transmits the coherent interrupt command packet to another circuit node, possibly the second circuit node. However, the transmission of the coherent interrupt command packet may be delayed. Any delay in transmission is based on a comparison of the pipe identifications of the non-coherent command packets.
摘要:
A bypass mechanism is disclosed for a computer system that executes load and store instructions out of order. The bypass mechanism compares the address of each issuing load instruction with a set of recent store instructions that have not yet updated memory. A match of the recent stores provides the load data instead of having to retrieve the data from memory. A store queue holds the recently issued stores. Each store queue entry and the issuing load includes a data size indicator. Subsequent to a data bypass, the data size indicator of the issuing load is compared against the data size indicator of the matching store queue entry. A trap is signaled when the data size indicator of the issuing load differs from the data size indicator of the matching store queue entry. The trap signal indicates that the data provided by the bypass mechanism was insufficient to satisfy the requirements of the load instruction. The bypass mechanism also operates in cases in which multiple prior stores to the same address are pending when a load that needs to read that address issues.
摘要:
A data caching system comprises a hashing function, a data store, a tag array, a page translator, a comparator and a duplicate tag array. The hashing function combines an index portion of a virtual address with a virtual page portion of the virtual address to form a cache index. The data store comprises a plurality of data blocks for holding data. The tag array comprises a plurality of tag entries corresponding to the data blocks, and both the data store and tag array are addressed with the cache index. The tag array provides a plurality of physical address tags corresponding to physical addresses of data resident within corresponding data blocks in the data store addressed by the cache index. The page translator translates a tag portion of the virtual address to a corresponding physical address tag. The comparator verifies a match between the physical address tag from the page translator and the plurality of physical address tags from the tag array, a match indicating that data addressed by the virtual address is resident within the data store. Finally, the duplicate tag array resolves synonym issues caused by hashing. The hashing function is such that addresses which are equivalent mod 213 are pseudo-randomly displaced within the cache. The preferred hashing function maps VA to bits of the cache index.
摘要:
A technique for verification of a complex integrated circuit design, such as a microprocessor, using a randomly generated test program to simulate internal events and to determine the timing of external events. The simulation proceeds in two passes. During a first pass, the randomly generated test program and data vectors are applied to a simulation model of the design being verified. During this first pass, an internal agent collects profile data about internal events such as addresses and program counter contents as they occur. During a second pass of the process, the profile data is used to generate directed external events based upon the data observed during the first pass. In this manner, the advantages of rapid test vector generation provided through random schemes is achieved at the same time that a more directed external event correlation is accomplished.
摘要:
In a data processing system in which a plurality of data processing units or subsystems exchange logic signal groups by means of a system bus, apparatus is provided to allow sufficient time to permit transients on the system bus to decay, thereby increasing the integrity of the data. When the logic signal groups are applied to the system bus via conducting and nonconducting transistors, the presence of a logic signal on the system bus immediately prior to the application of a set of logic signals from a different data processing unit can delay the on-set of conduction of the most recently activated transistors, thereby resulting in transients of long duration. To accommodate these long transient conditions, the application of the new set of logic signals can be delayed until the transients on the system bus have been attenuated. Apparatus is disclosed for prohibiting access to the system bus by any subsystem during the system clock cycle following a subsystem access or by preventing access to the system bus by subsystems determined by the subsystem having access during the prior system clock cycle.
摘要:
In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, a branch direction predictor may be updated responsive to a misprediction and also responsive to the branch prediction being within a threshold of transitioning between predictions. To avoid a lookup to determine if the threshold update is to be performed, the branch predictor may detect the threshold update during prediction, and may transmit an indication with the branch.
摘要:
In an embodiment, one or more fabric control circuits may be inserted in a communication fabric to control various aspects of the communications by components in the system. The fabric control circuits may be included on the interface of the components to the communication fabric, in some embodiments. In other embodiments that include a hierarchical communication fabric, fabric control circuits may alternatively or additionally be included. The fabric control circuits may be programmable, and thus may provide the ability to tune the communication fabric to meet performance and/or functionality goals.
摘要:
In one embodiment, an apparatus comprises a first clocked storage device operable in a first clock domain corresponding to a first clock signal. The first clocked storage device has an input coupled to receive one or more bits transmitted on the input from a second clock domain corresponding to a second clock signal. The apparatus further comprises control circuitry configured to ensure that a change in a value of the one or more bits transmitted on the input meets setup and hold time requirements of the first clocked storage device. The control circuitry is responsive to a sample history of one of the first clock signal or the second clock signal to detect a phase relationship between the first clock signal and the second clock signal on each clock cycle to ensure the change meets the setup and hold time requirements.
摘要:
A processor supports a processing mode in which the address size is greater than 32 bits and the operand size may be 32 or 64 bits. The address size may be nominally indicated as 64 bits, although various embodiments of the processor may implement any address size which exceeds 32 bits, up to and including 64 bits, in the processing mode. The processing mode may be established by placing an enable indication in a control register into an enabled state and by setting a first operating mode indication and a second operating mode indication in a segment descriptor to predefined states. Other combinations of the first operating mode indication and the second operating mode indication may be used to provide compatibility modes for 32 bit and 16 bit processing compatible with the x86 processor architecture (with the enable indication remaining in the enabled state).
摘要:
A high speed bus system for use in a shared memory system that allows for the high speed transmissions of commands and data between a number of processors and a memory array of a multi-processor, shared memory system, with the high speed bus system including a central unit and a series of uni-directional buses that connect between the plurality of processors and shared memory, with the central unit including arbitration logic and a series of multiplexers to determine which CPUs are granted access to shared buses, scheduling logic that works with the arbitration logic and multiplexers to determine which CPUs are granted access to the shared buses, and port logic for combining the CPU transmissions and determining if such transmissions are valid.