摘要:
A multi-level instruction cache memory system for a computer processor. A relatively large cache has both instructions and data. The large cache is the primary source of data for the processor. A smaller cache dedicated to instructions is also provided. The smaller cache is the primary source of instructions for the processor. Instructions are copied from the larger cache to the smaller cache during times when the processor is not accessing data in the larger cache. A prefetch buffer transfers instructions from the larger cache to the smaller cache. If a cache miss occurs for the smaller cache, and the instruction is in the prefetch buffer, the system provides the instruction with no delay relative to a fetch from the smaller instruction cache. If a cache miss occurs for the smaller cache, and the instruction is being fetched from the larger cache, or available in the larger cache, the system provides the instruction with minimal delay relative to a fetch from the smaller instruction cache.
摘要:
The present invention is a method for implementing two architectures on a single chip. The method uses a fetch engine to retrieve instructions. If the instructions are macroinstructions, then it decodes the macroinstructions into microinstructions, and then bundles those microinstructions using a bundler, within an emulation engine. The bundles are issued in parallel and dispatched to the execution engine and contain pre-decode bits so that the execution engine treats them as microinstructions. Before being transferred to the execution engine, the instructions may be held in a buffer. The method also selects between bundled microinstructions from the emulation engine and native microinstructions coming directly from the fetch engine, by using a multiplexer or other means. Both native microinstructions and bundled microinstructions may be held in the buffer. The method also sends additional information to the execution engine.
摘要:
The present invention is a method for implementing two architectures on a single chip. The method uses a fetch engine to retrieve instructions. If the instructions are macroinstructions, then it decodes the macroinstructions into microinstructions, and then bundles those microinstructions using a bundler, within an emulation engine. The bundles are issued in parallel and dispatched to the execution engine and contain pre-decode bits so that the execution engine treats them as microinstructions. Before being transferred to the execution engine, the instructions may be held in a buffer. The method also selects between bundled microinstructions from the emulation engine and native microinstructions coming directly from the fetch engine, by using a multiplexor or other means. Both native microinstructions and bundled microinstructions may be held in the buffer. The method also sends additional information to the execution engine.
摘要:
A method of improving the performance of a computer processor by recognizing that two consecutive register instructions can be executed simultaneously and executing the two instructions simultaneously while generating a single data address and while performing exception checking on a single data address. During an instruction fetch process, two consecutive instructions are tested to determine if both are either register load instructions or register save instructions. If both instructions are load or save register instructions, the corresponding data addresses are tested to see if both data addresses are in the same double word. If both data addresses are in the same double word, then the instructions are executed simultaneously. Only one data address generation is required and exception processing is performed on only one data address. In one example embodiment, a simplified test rapidly ensures that both data addresses are in the same double word, but also requires the base addresses to be at an even word boundary. In a second embodiment, where the processor includes an alignment test as a separate test, an even more simple test rapidly ensures that both data address are in the same double word without checking alignment.
摘要:
An event calling for a migration of a workload from a source processor set of processing units to a target processor set of processing units is detected. Processes of the workload are allocated to a second processor set of processing units so that some workload processes are executed on the source processor set and some workload processes are executed on a second processor set of processor units. Then, some workload processes are allocated to the second processor set so that no workload process is executing on the source processor set and at least some of said processes are executing on the second process set. The second processor set can be the target processor set or an intermediate processor set from which the workload is migrated to the target processor set.
摘要:
A cache is provided for operatively coupling a processor with a main memory. The cache includes a cache memory and a cache controller operatively coupled with the cache memory. The cache controller is configured to receive memory requests to be satisfied by the cache memory or the main memory. In addition, the cache controller is configured to process cache activity information to cause at least one of the memory requests to bypass the cache memory.
摘要:
An apparatus and method for efficiently generating arithmetic flags in a computer system. The system includes an eflags register to stored partially computed flags computed by an arithmetic logic unit. The stored partial flags are computed in one cycle. The stored flags are decoded by one of two consuming instructions, PRODF or TBIT, in a second cycle.
摘要:
A cache is provided for operatively coupling a processor with a main memory. The cache includes a cache memory and a cache controller operatively coupled with the cache memory. The cache controller is configured to receive memory requests to be satisfied by the cache memory or the main memory. In addition, the cache controller is configured to process cache activity information to cause at least one of the memory requests to bypass the cache memory.
摘要:
A method and an apparatus checks the fine-grain correctness of a microcode machine central processor unit (CPU) behavioral model. Macroinstructions are decomposed into microinstructions and each microinstruction is executed sequentially. A sequence of microinstructions is determined by an emulated microinstruction sequencer, using dynamic execution information, including information from execution of prior microinstructions in the sequence of microinstructions. At the end of execution of each microinstruction, a reference state is compared to a corresponding state of the behavioral model, and any differences are noted. After execution of all microinstructions in the microinstruction sequence, a reference state is compared to a corresponding state of the behavioral model, and any differences are noted.
摘要:
Circuitry for providing external access to signals that are internal to an integrated circuit chip package. A plurality of N:1 multiplexers are physically distributed throughout the integrated circuit die. Each of the multiplexers has its N inputs coupled to a nearby set of N nodes within the integrated circuit, and each of the multiplexers is coupled to a source of select information operable to select one node from the set of N nodes for external access. Each of the multiplexers has its output coupled to an externally-accessible chip pad. The integrated circuit is a microprocessor, and the source of select information may include a storage element. If so, additional circuitry is provided for writing data from a register of the microprocessor to the storage element using one or more microprocessor instructions. Each multiplexer may be coupled to a different source of select information, or all multiplexers may be coupled to the same select information. Moreover, a fixed set of interconnect traces may be provided to couple a fixed set of nodes to an additional set of externally-accessible chip pads. One or more M:1 multiplexers may also be provided, having their M inputs coupled to M different outputs of the N:1 multiplexers. Each of the M:1 multiplexers may be coupled to a second source of select information. Preferably, the outputs of the M:1 multiplexers will be coupled to a circuitry for facilitating debug and performance monitoring of the integrated circuit.