摘要:
A method of improving the performance of a computer processor by recognizing that two consecutive register instructions can be executed simultaneously and executing the two instructions simultaneously while generating a single data address and while performing exception checking on a single data address. During an instruction fetch process, two consecutive instructions are tested to determine if both are either register load instructions or register save instructions. If both instructions are load or save register instructions, the corresponding data addresses are tested to see if both data addresses are in the same double word. If both data addresses are in the same double word, then the instructions are executed simultaneously. Only one data address generation is required and exception processing is performed on only one data address. In one example embodiment, a simplified test rapidly ensures that both data addresses are in the same double word, but also requires the base addresses to be at an even word boundary. In a second embodiment, where the processor includes an alignment test as a separate test, an even more simple test rapidly ensures that both data address are in the same double word without checking alignment.
摘要:
The present invention is a method for implementing two architectures on a single chip. The method uses a fetch engine to retrieve instructions. If the instructions are macroinstructions, then it decodes the macroinstructions into microinstructions, and then bundles those microinstructions using a bundler, within an emulation engine. The bundles are issued in parallel and dispatched to the execution engine and contain pre-decode bits so that the execution engine treats them as microinstructions. Before being transferred to the execution engine, the instructions may be held in a buffer. The method also selects between bundled microinstructions from the emulation engine and native microinstructions coming directly from the fetch engine, by using a multiplexer or other means. Both native microinstructions and bundled microinstructions may be held in the buffer. The method also sends additional information to the execution engine.
摘要:
The present invention is a method for implementing two architectures on a single chip. The method uses a fetch engine to retrieve instructions. If the instructions are macroinstructions, then it decodes the macroinstructions into microinstructions, and then bundles those microinstructions using a bundler, within an emulation engine. The bundles are issued in parallel and dispatched to the execution engine and contain pre-decode bits so that the execution engine treats them as microinstructions. Before being transferred to the execution engine, the instructions may be held in a buffer. The method also selects between bundled microinstructions from the emulation engine and native microinstructions coming directly from the fetch engine, by using a multiplexor or other means. Both native microinstructions and bundled microinstructions may be held in the buffer. The method also sends additional information to the execution engine.
摘要:
A multi-level instruction cache memory system for a computer processor. A relatively large cache has both instructions and data. The large cache is the primary source of data for the processor. A smaller cache dedicated to instructions is also provided. The smaller cache is the primary source of instructions for the processor. Instructions are copied from the larger cache to the smaller cache during times when the processor is not accessing data in the larger cache. A prefetch buffer transfers instructions from the larger cache to the smaller cache. If a cache miss occurs for the smaller cache, and the instruction is in the prefetch buffer, the system provides the instruction with no delay relative to a fetch from the smaller instruction cache. If a cache miss occurs for the smaller cache, and the instruction is being fetched from the larger cache, or available in the larger cache, the system provides the instruction with minimal delay relative to a fetch from the smaller instruction cache.
摘要:
An apparatus and methods for optimizing prefetch performance. Logical ones are shifted into the bits of a shift register from the left for each instruction address prefetched. As instruction addresses are fetched by the processor, logical zeros are shifted into the bit positions of the shift register from the right. Once initiated, prefetching continues until a logical one is stored in the nth-bit of the shift register. Detection of this logical one in the n-th bit causes prefetching to cease until a prefetched instruction address is removed from the prefetched instruction address register and a logical zero is shifted back into the n-th bit of the shift register. Thus, autonomous prefetch agents are prevented from prefetching too far ahead of the current instruction pointer resulting in wasted memory bandwidth and the replacement of useful instruction in the instruction cache.
摘要:
Disclosed herein are methods and apparatus which provide a processor with raw, uncorrected data. The uncorrected data (or pre-corrected data) is retrieved from memory and then "bypassed" to a processing unit before its error status is known. Concurrently, error correction hardware determines the data's error status. Since the correct/incorrect indication is the first result available from error correction hardware, this result may be used to gate the actions of a processing unit prior to its taking an irrevocable action with possibly incorrect data. If bypassed data is incorrect, processing unit control logic may flag it as such and read corrected data from the output of error correction hardware. If bypassed data is correct (as will usually be the case), bypassed data may be consumed by a processing unit in due course.
摘要:
An associative cache memory for a computer with improved cache hit times. All possible data items are presented to bus driver circuits, thereby deferring data selection as long as possible. Driving and multiplexing are combined. The output of tag comparison directly selects at most one set of driver circuits. As a result, the only processing time in series with tag comparison is driver circuit selection. Since the data selection delay in series with tag comparison delay is reduced, the time delay is reduced for a clock edge for data driving after tag comparison, thereby enabling a faster clock.
摘要:
Presented is an internal integrated debug trigger apparatus for use in debugging functional and electrical failures of an integrated circuit chip. The debug trigger apparatus includes a plurality of software programmable trigger registers and a plurality of software programmable trigger function blocks. Each trigger register monitors a plurality of integrated circuit signals which may include signals sent to the external pins of the integrated circuit and signals present internal to the chip. If the value of the monitored signals matches the programmed trigger condition, the trigger register produces a trigger match signal. Each trigger function block receives a combination of the trigger match signals generated by the trigger registers and each computes its programmed boolean minterm function on its inputs. Each trigger function block produces a trigger capture signal which may be true or false according to the computed function of the inputs. The debug trigger may also include a programmable iteration counter which allows for repetition of a trigger condition before it is provided external to the chip. The output of the iteration counter may be connected as input to one or more of the other trigger registers to allow for different start and end conditions.
摘要:
A method of, and apparatus for, interfacing the hardware of a processor capable of processing instructions from more than one type of instruction set. More particularly, an engine responsible for fetching native instructions from a memory subsystem (such as an EM fetch engine) is interfaced with an engine that processes emulated instructions (such as an x86 engine). This is achieved using a handshake protocol, whereby the x86 engine sends an explicit fetch request signal to the EM fetch engine along with a fetch address. The EM fetch engine then accesses the memory subsystem and retrieves a line of instructions for subsequent decode and execution. The EM fetch engine sends this line of instructions to the x86 engine along with an explicit fetch complete signal. The EM fetch engine also includes a fetch address queue capable of holding the fetch addresses before they are processed by the EM fetch engine. The fetch requests are processed such that more than one fetch request may be pending at the same time. If a pending fetch request is canceled due to a pipeline flush, then the fetch address queue is cleared and the pending fetch requests are canceled. The system also prevents macroinstruction (MIQ)-related stalls by using a speculative write pointer to control the issuance of fetch requests, thereby preventing the MIQ from becoming oversubscribed.
摘要:
Automatic manufacturing test case generation mechanism receives a plurality of design verification test cases (DVTCs) from one or more sources and based thereon automatically generates manufacturing test cases (MTCs). Each of the manufacturing test cases (MTCs) generated by the automatic manufacturing test includes at least a first design verification test case and a second design verification test case.