摘要:
Coherency directory updating is provided in a multiprocessor computing system. A plurality of memory resources have a directory, and are operably connected to an interconnect fabric. A cell is operably connected to the interconnect fabric. The cell has a cache including an entry for each of a plurality of coherency units, each coherency unit included in a memory block representing a contiguous portion of the plurality of memory resources. A controller is operably connected to the interconnect fabric. The controller is configured to control a portion of the plurality of memory resources, and has a comparator configured to identify whether a memory block is local. If the memory block is local, the controller is configured to set a state of the directory to exclusive for a write transaction. If the memory block is not local, the controller is configured to set the state to invalid for a write transaction.
摘要:
An event calling for a migration of a workload from a source processor set of processing units to a target processor set of processing units is detected. Processes of the workload are allocated to a second processor set of processing units so that some workload processes are executed on the source processor set and some workload processes are executed on a second processor set of processor units. Then, some workload processes are allocated to the second processor set so that no workload process is executing on the source processor set and at least some of said processes are executing on the second process set. The second processor set can be the target processor set or an intermediate processor set from which the workload is migrated to the target processor set.
摘要:
A cache is provided for operatively coupling a processor with a main memory. The cache includes a cache memory and a cache controller operatively coupled with the cache memory. The cache controller is configured to receive memory requests to be satisfied by the cache memory or the main memory. In addition, the cache controller is configured to process cache activity information to cause at least one of the memory requests to bypass the cache memory.
摘要:
An apparatus and method for efficiently generating arithmetic flags in a computer system. The system includes an eflags register to stored partially computed flags computed by an arithmetic logic unit. The stored partial flags are computed in one cycle. The stored flags are decoded by one of two consuming instructions, PRODF or TBIT, in a second cycle.
摘要:
A cache is provided for operatively coupling a processor with a main memory. The cache includes a cache memory and a cache controller operatively coupled with the cache memory. The cache controller is configured to receive memory requests to be satisfied by the cache memory or the main memory. In addition, the cache controller is configured to process cache activity information to cause at least one of the memory requests to bypass the cache memory.
摘要:
A method and an apparatus checks the fine-grain correctness of a microcode machine central processor unit (CPU) behavioral model. Macroinstructions are decomposed into microinstructions and each microinstruction is executed sequentially. A sequence of microinstructions is determined by an emulated microinstruction sequencer, using dynamic execution information, including information from execution of prior microinstructions in the sequence of microinstructions. At the end of execution of each microinstruction, a reference state is compared to a corresponding state of the behavioral model, and any differences are noted. After execution of all microinstructions in the microinstruction sequence, a reference state is compared to a corresponding state of the behavioral model, and any differences are noted.
摘要:
Circuitry for providing external access to signals that are internal to an integrated circuit chip package. A plurality of N:1 multiplexers are physically distributed throughout the integrated circuit die. Each of the multiplexers has its N inputs coupled to a nearby set of N nodes within the integrated circuit, and each of the multiplexers is coupled to a source of select information operable to select one node from the set of N nodes for external access. Each of the multiplexers has its output coupled to an externally-accessible chip pad. The integrated circuit is a microprocessor, and the source of select information may include a storage element. If so, additional circuitry is provided for writing data from a register of the microprocessor to the storage element using one or more microprocessor instructions. Each multiplexer may be coupled to a different source of select information, or all multiplexers may be coupled to the same select information. Moreover, a fixed set of interconnect traces may be provided to couple a fixed set of nodes to an additional set of externally-accessible chip pads. One or more M:1 multiplexers may also be provided, having their M inputs coupled to M different outputs of the N:1 multiplexers. Each of the M:1 multiplexers may be coupled to a second source of select information. Preferably, the outputs of the M:1 multiplexers will be coupled to a circuitry for facilitating debug and performance monitoring of the integrated circuit.
摘要:
Pipeline structure that is arranged to allow 1.5 cycle access time for both data and instruction cache without imposing additional instruction step delays than that imposed by data and instruction cache that have 1 cycle access time. Half cycle pulses are produced to allow execution of various instructions in 0.5 cycles. A bypass signal is generated to allow data from a current load instruction to be available for a second subsequent instruction even though the access time for data cache is 1.5 cycles. Additionally, a branch address is available for a third subsequent instruction even though instruction cache access time is 1.5 cycles. The present invention shows the initiation of an instruction step for each cycle and 1.5 cycle access time for cache memory. The present invention can also be implemented by implementing an instruction every 2 cycles and providing 3 cycle access time for cache memory.
摘要:
A method and apparatus that utilizes a simple test and flush mechanism to implement branch instructions of one Instruction Set Architecture (ISA) using instructions of another ISA is described. During the decoding and sequencing of microinstructions to implement a branch instruction, a fix-up address, which represents the remedial branch target in the event of a mispredicted target or branch condition, is determined and stored. A test condition is set to determine if the prediction or the branch condition was correct. When the test condition fails, the instruction execution pipeline is immediately flushed to avoid executing any instruction remaining in the pipeline following the branch instructions. The flushing of the pipeline signals the instruction fetch control mechanism to redirect the instruction flow to the instruction corresponding to the fix-up address. A method and apparatus according to the present invention further allows flushing of the pipeline when conditions other than ones involved in branch instructions occurs, e.g., to flush stale instructions.
摘要:
Methods for emulating an instruction set extension, comprising providing data to be operated upon, executing a first instruction with respect to a first portion of the data without committing the results of the first executed instruction, if no unmasked exceptions occur with respect to the first portion of the data, executing a second instruction with respect to a second portion of the data, and if no unmasked exceptions occur with respect to the second portion of the data, committing the results of the second executed instruction and again executing the first instruction with respect to the first portion of the data. If the first instruction is executed again, its results are committed. A handler is invoked if an unmasked exception occurs.