摘要:
A superscalar microprocessor implements a microcode instruction unit that patches existing microcode instructions with substitute microcode instructions. A flag bit is associated with each line of microcode in the microcode instruction unit. If the flag bit is asserted, the microcode instruction unit branches to a patch microcode routine that causes a substitute microcode instruction stored in external RAM to be loaded into patch data registers. The transfer of the substitute microcode instruction to the patch data registers is accomplished using data transfer procedures. The microcode instruction unit then dispatches the substitute instructions stored in the patch data registers and the substitute instruction is executed by a functional unit in place of the existing microcode instruction.
摘要:
A superscalar microprocessor implements a microcode instruction unit with sequence control fields appended to each microcode line. The sequence control fields indicate whether a subsequent line contains a branch instruction, whether a subsequent line is the last line in a microcode sequence, and other sequence control information. The sequence control information is accessed one cycle before the microcode line. This allows the next address to be calculated in parallel with the accessing of the microcode instruction line. By generating the next address in parallel with accessing the microcode line, the time delay from accessing one microcode line to accessing the next microcode line is reduced. The sequence control field additionally indicates how many microcode instructions are in the last line of the microcode sequence. If the last microcode line of the microcode sequence contains less instructions than the number of issue positions available, fastpath, or directly decodable instructions, can be issued with the final microcode line.
摘要:
An instruction dispatch apparatus is provided in which a directly-decoded instruction and a microcode instruction are concurrently dispatched ("packed"). The instruction which is second in program order is retained until the succeeding clock cycle. During the succeeding clock cycle, a microcode unit determines if the microcode instruction and the directly-decoded instruction, when taken together, occupy less than or equal to the total number of issue positions available in the microprocessor. If the microcode unit determines that less than or equal to the total number of issue positions are occupied, then the packing is successful. If the microcode unit determines that greater than the total number of issue positions are occupied, then the packing is unsuccessful and the retained instruction is redispatched. Additionally, instruction dispatch selection is performed in two phases. First, a number of instructions are selected as potentially dispatchable instructions. From the potentially dispatchable instructions, a set of actually dispatched instructions may be selected based upon the success or failure of instruction packing during the previous clock cycle and whether or not packing was performed. If instruction packing was not performed during the previous clock cycle or was performed unsuccessfully, then the instructions which are foremost in program order within the potentially dispatchable instructions are selected. However, if instruction packing was successfully performed in the previous clock cycle, then the retained instruction is not selected for dispatch.
摘要:
A Carry-Save Adder circuit having differential signal response and output is provided. The circuit includes a pair of cross-coupled transistors powered by an upper voltage rail. The output of a first transistor of the pair of cross-coupled transistors is connected to the output of a first precharge transistor that is powered by the upper rail and controlled by a clock. The output of a second transistor of the pair of cross-coupled transistors is connected to the output of a second precharge transistor that is powered by the upper rail and controlled by the clock. A logic circuit is wired to perform a logical function, either a Sum or a Carry function, and has a plurality of inputs, an output, and a complementary output. The output of the logic circuit is connected to the output of the first transistor of the pair of cross-coupled transistors, and the complementary output is connected to the output of the second transistor of the pair of cross-coupled transistors. An enable transistor having a first terminal connected to a lower voltage rail, and being controlled by the complement of the clock, has a second terminal connected to the logic circuit such that the logic circuit is connected to the lower voltage rail through the enable transistor.
摘要:
A hardware description language representation of an original circuit block containing one or more hierarchies may be obtained. Some, or all of the hierarchies may be dissolved to access each circuit component within the original circuit block at a same level of hierarchy. Designated circuit components may then be grouped together to create new circuit blocks at a new level of hierarchy. Components and signals within each new circuit block may be renamed to match logically corresponding components and signals within each other new circuit block. Missing pins may be added for each new circuit block, and connected to respective associated signals within the new circuit block, and logically equivalent pins may be given the same name to ensure the new circuit blocks are logically equivalent to each other and have identical interfaces. One of the new circuit blocks may be selected for physical build to obtain one or more physical instances corresponding to the selected new circuit block, and a top-level build may link each new circuit block instance to one of those one or more physical instances.
摘要:
A microprocessor configured to decode a plurality of instruction bytes in parallel is disclosed. The microprocessor may comprise a plurality of single-byte decoder/execution units that are configured to receive instruction bytes and cross-talk to determine instruction boundaries and instruction field boundaries. Once and instruction has been identified, a determination is made as to whether or not the instruction is a simple instruction. Simple instructions are executed within the decoder/execution units, while complex instructions are forwarded to full-fledged functional units. A computer system and method for predecoding instructions are also disclosed.
摘要:
A superscalar microprocessor includes a combination floating point and multimedia unit. The floating point and multimedia unit includes one set of registers. The multimedia core and floating point core share the one set of registers. Each register as a type field associated with the register. The type field identifies whether the associated register contains valid data and whether the data is of multimedia type or floating point type. If the register stores floating point type data, the type field further indicates which of a plurality of floating point types the register stores such as: zero, infinity and normal. The floating point core relies on the type field to identify special floating point numbers such as zero and infinity. To ensure predictable results when a floating point instruction is executed subsequent to a multimedia instruction, a retyping algorithm retypes registers typed as multimedia type when the first floating point instruction subsequent to a multimedia instruction is executed. The retyping algorithm reads each register and reclassifies the registers classified as multimedia type. The reclassification algorithm classifies the contents of the register interpreted as floating point data.
摘要:
A hardware description language representation of an original circuit block containing one or more hierarchies may be obtained. Some, or all of the hierarchies may be dissolved to access each circuit component within the original circuit block at a same level of hierarchy. Designated circuit components may then be grouped together to create new circuit blocks at a new level of hierarchy. Components and signals within each new circuit block may be renamed to match logically corresponding components and signals within each other new circuit block. Missing pins may be added for each new circuit block, and connected to respective associated signals within the new circuit block, and logically equivalent pins may be given the same name to ensure the new circuit blocks are logically equivalent to each other and have identical interfaces. One of the new circuit blocks may be selected for physical build to obtain one or more physical instances corresponding to the selected new circuit block, and a top-level build may link each new circuit block instance to one of those one or more physical instances.
摘要:
A microprocessor configured to predecode variable length instructions in a massively parallel fashion is disclosed. The microprocessor may comprise a prefetch fetch unit configured to read instruction bytes from memory and a plurality of predecode unit configured to receive and predecode the instruction bytes. The predecode units are configured to operate separately and in parallel to generate one or more predecode bits per instruction byte. The microprocessor may further include a predecode bit correction unit configured to receive, verify, and correct the predecode bits from the parallel predecode units. A computer system and method for predecoding instructions are also disclosed.
摘要:
A microprocessor configured to cache basic blocks of instructions is disclosed. The microprocessor may comprise decoding logic, a basic block cache, and a branch prediction unit. The decoding logic is coupled to receive and decode variable-length instructions into padded instructions that have one of a predetermined number of predetermined lengths. The decoding logic is further configured to form basic blocks of instructions from the padded and decoded instructions. Basic blocks are natural divisions in instruction streams resulting from branch instructions. The start of a basic block is a target of a branch, and the end is another branch instruction. The basic block cache is configured to store the basic blocks in a plurality of storage locations, wherein each storage location is configured to store an address tag, a link bit, and at least a portion of one basic block. The link bit indicates whether the basic block stored in said storage location extends into another storage location. The branch prediction unit has a branch prediction array storing branch prediction information corresponding to each storage location within the basic block cache. A computer system and method for operating are also disclosed.