摘要:
A method and system for fast calculation of the sticky bit and a function of the guard bit is disclosed. A first aspect of the method and system provides a fast calculation of the sticky bit. A second aspect provides a fast calculation of a function of the guard bit. Both aspects comprise means for providing an intermediate result of a floating point mathematical operation involving at least a first and a second operand and means for providing a mask indicating a position of a leading one in a mantissa of the intermediate result. In the first aspect, means for aligning a first bit of the mask to an (n+2)nd bit of the intermediate result, where n is the number of bits in a mantissa of the first or second operand, are coupled to the intermediate result providing means. In the second aspect, means for aligning a first bit of the mask to an (n+1)st bit of the intermediate result are coupled to the intermediate result providing means. In both aspects, means for providing an output are coupled to the aligning means and intermediate result providing means. The output of the first aspect comprises the sticky bit. The output of the second aspect comprises a function of the guard bit. Thus, the method and system allow the sticky bit and a function of the guard bit to be calculated substantially simultaneously with normalization. Because the method and system allow fast determination of the sticky bit and a function of the guard bit, the overall speed of the calculation is increased and system performance is improved.
摘要:
A system and method for calculating a floating point add/subtract of a plurality of floating point operands is disclosed. The system comprises at least one pair of data paths. Each pair of data paths comprises a first data path and a second data path. The first data path includes a first aligner, a first adder coupled to the first aligner, and a first normalizer coupled to the first adder. The first normalizer is capable of shifting a mantissa by a substantially smaller number of digits than the first aligner. The second data path comprises control logic, a second aligner coupled to the control logic, a second adder coupled to the second aligner, and a second normalizer coupled to the second adder. The control logic provides a control signal that is responsive to a first predetermined number of digits of each exponent of a pair of exponents. The pair of exponents are the exponents for a pair of inputs to the second data path. The second aligner is responsive to the control signal provided by the control logic. In addition, the second normalizer is capable of shifting a mantissa by a substantially larger number of digits than the second aligner.
摘要:
A method and system for an infinite precision split multiply and add operation which has increased speed. The method and system for providing a split multiply and add of a plurality of operands include a multiplier and an adder means. The multiplier multiplies a first portion of the plurality of operands, thereby providing a product. The adder, which combines the remaining operands and the product, comprise at least one pair of data paths. Each pair of data paths comprises a first data path and a second data path. The first data path comprises a first aligner, a first adder, and a first normalizer capable of shifting a mantissa by a substantially fewer number digits than the aligner. The second data path comprises a second aligner, a second adder, and a second normalizer capable of shifting a mantissa by a substantially larger number of digits than the aligner. Accordingly, the present invention includes split multiply and add data paths which, individually, are faster than a fused multiply and add. In addition, the split multiply and add data paths can preserve the appearance of infinite precision. Consequently, overall system performance is increased.
摘要:
A method and system for reducing the dispatch latency of instructions of a processor provides for reordering the instructions in a predetermined format before the instructions enter the cache. The method and system also stores information in the cache relating to the reordering of the instructions. The reordered instructions are then provided to the appropriate execution units based upon the predetermined format. With this system, a dispatch buffer is not required when sending the instructions to the cache.
摘要:
A system and method for improving arbitration of a plurality of events that may require access to a cache is disclosed. In a first aspect, the method and system provide dynamic arbitration. The first aspect comprises first logic for determining whether at least one of the plurality of events requires access to the cache and for outputting at least one signal in response thereto. Second logic coupled to the first logic determines the priority of each of the plurality of events in response to the at least one signal and outputs a second signal specifying the priority of each event. Third logic coupled to the second logic grants access to the cache in response to the second signal. A second aspect of the method and system provides user programmable arbitration. The second aspect comprises a storage unit which allows the user to input information indicating the priority of at least one of the plurality of events and outputs a first signal in response to the information. In the second aspect, first logic coupled to the storage unit determines the priority of each of the plurality of events in response to the first signal and outputs a second signal indicating the priority of each event. Second logic coupled to the first logic grants access to the cache in response to the second signal.
摘要:
A floating point arithmetic unit performs a multiply-add function B+(A*C) in which an alignment shifter is responsive to an input signal representative of the B mantissa. The shifter includes a sequential stack of multiplexers, typically three (3), for shifting the B mantissa to align it with the A*C product, and a complementer contained between two of the multiplexers to invert the signals when B is a negative number. A shift amount generator responsive to the A, B and C exponents produces control signals for the multiplexers. The shift amount generator includes a multiple input adder utilizing carry save adder and carry lookahead adder techniques to minimize delay, and separate decoders for each multiplexer or group of multiplexers. The generator also includes a Leading Zeros Anticipator (LZA) circuit for the most significant bits to limit shift amount signals that are within the shifting range of the shifter, which reduces the delay attributed to the carry lookahead adder. The multiplexers are arranged in a sequence such that the control signals for the first multiplexers are dependent only on the least significant bits and thus can be generated earliest, and therefore the delay of these multiplexers and the delay of the complementer is in parallel with the delay for producing the control signals to the last multiplexers.
摘要:
A processor including a register, an execution unit, a temporary result buffer, and a commit function circuit. The register includes at least one register bit and may include one or more sticky bits. The execution unit is suitable for executing a set of computer instructions. The temporary result buffer is configured to receive, from the execution unit, register bit modification information provided by the instructions. The temporary result buffer is suitable for storing the modification information in set/clear pairs of bits corresponding to respective register bits of the register. The commit function circuit is configured to receive the set/clear pairs of bits from the temporary result buffer when the instruction is committed. The commit function circuit is suitable for generating an updated bit in response to receiving the set/clear pairs of bits. The updated bit is then committed to the corresponding register bit of the register.
摘要:
A system and method for processing count and link branch instructions that allows multiple branches to be outstanding at the same time without being limited to the number of rename registers allocated to the count and link registers. The method and system comprises an architected count register and an architected link register that are each connected to a look-ahead register. Information in the architected count or link register is copied into the look-ahead register when a branch instruction is encountered that will alter the contents of the count or link registers. Information in the look-ahead register is saved in a shadow register when an unresolved branch is encountered, and restored by the shadow register if the outcome of the unresolved branch is mispredicted.
摘要:
An apparatus and method reduces the number of rename registers for a floating point status and control register (FPSCR) in a superscalar microprocessor executing out of order/speculative instructions. A floating point queue (FPQ) receives speculative instructions and issues out-of-order instructions to FPQ execution units, each instruction containing a group identifier tag (GID) and a target identifier tag (TID). The GID tag indicates a set of instructions bounded by interruptible or branch instructions. The TID indicates a targeted architected facility and the program order of the instruction. The FPSCR contains status and control bits for each instruction and is updated when an instruction is executed and committed. A FPSCR renaming mechanism assigns an FPSCR rename to selected FPSCR bits during instruction dispatch from an instruction fetch unit (IFU) to the FPQ when an arithmetic instruction is dispatched that has a GID which has not been committed by instruction dispatch unit (IDU) and does not already have an FPSCR rename assigned, as determined by the FPQ. The FPSCR rename mechanism utilizes the TID upon the presence of selected bits in the FPSCR. The bits in the FPSCR rename are updated as a new arithmetic instruction enters a write-back stage in the FPU. The resulting FPSCR updates of all instructions in a given GID are merged into one FPSCR rename register. A FPSCR rename register exists for each GID rather than a FPSCR rename register for each FPR rename register as in the prior art.
摘要:
A system and method for minimizing the delay associated with executing a register dependent instruction in which the execution of the register dependent instruction is dependent on an operand of a preceding instruction. In a branch unit for executing register dependent instructions, functional units are connected via a rename bus, and the functional units are connected to a general purpose register (GPR) via a GPR bus. The system and method routes the rename bus and the GPR bus directly to an instruction fetch address register thereby enabling the branch unit to execute a register dependent instruction during the same cycle as the preceding instruction.