摘要:
A logic circuit including at least one evaluate circuit coupled to a static output logic circuit. In one example, the evaluate circuit includes a dynamic node, a full keeper, an evaluate device, and a logic tree. In some examples, the output logic circuit is a sampled static output logic circuit and includes a sample device. In some examples, the logic circuit includes multiple evaluate circuits, each with a dynamic node coupled to a control gate of a transistor of the output logic circuit. Some examples may include a delay in a clock signal to increase the internal race margin.
摘要:
A memory cache (14) has a plurality of cache lines (50) for storing a series of contiguous memory elements. Each series of memory elements are interlaced within the corresponding cache line on a element-by-element basis and on a bit-by-bit basis. This storage strategy allows the memory cache to output a subset memory elements within a cache line quickly and in the original contiguous order. The invention may be advantageously incorporated in an instruction cache of superscalar data processor to provide a series of sequential instructions for execution.
摘要:
A method and apparatus for performing integer and floating-point divide operations using a single modified SRT divider in a data processor. The floating-point and integer division is performed using SRT division on normalized positive mantissas (dividend and divisor). Integer division shares portions of the floating point circuitry, however, the sequence of operations is modified during the performance of an integer divide operation. The SRT divider performs a sequence of operations before and after an iteration loop to re-configure an integer divisor and dividend into a data path representation which the SRT algorithm requires for floating-point mantissas. During the iteration loop, quotient bits are selected and used to generate intermediate partial remainders. The quotient bits are also input to quotient registers which accumulate the final quotient mantissa. A full mantissa adder is used to generate a final remainder.
摘要:
In a data processor an Sweeney-Robertson-Tocher (SRT) divider is provided having a negative divisor sticky detection circuit. The negative divisor sticky detection circuit allows negative sticky correction to occur in the SRT divider without requiring additional iteration cycles. At the conclusion of iterative cycles of a divide operation, a final remainder is formed and stored in a latch in the SRT divider. The negative divisor sticky detection circuit determines whether a negative final remainder is equal in magnitude to a divisor value by bit-wise XORing the final remainder with the two's complement value of a divisor, immediately before sticky logic detects the negative sticky bit. The final sticky value is obtained by logically combining the negative sticky bit with a positive sticky bit computed by a positive divisor sticky detection circuit.
摘要:
A CAM/SRAM structure (42) performs address translations of variable length blocks, a "block address translator." Each address translation is stored in a register broken into an upper half and a lower half. The upper half contains CAM bit cells (56) which match an input effective address to a stored tag (BEPI) alternating with SRAM bit cells which store a block length tag (BL). The block length tag defines the length of the translated block and, hence, the number of bits which must match between the input effective address and the stored tag. The lower half contains SRAM bit cells which store a real address associated with the tag (BRPN) alternating with multiplexer circuits. In the event of a CAM match, each multiplexer circuit outputs either a real address bit or an input effective address bit, depending upon the block length tag. The two halves of each register are fabricated adjacent to each other in parallel rows to minimize routing requirements and reduce overall circuit capacitance.
摘要:
A precharge device (28) has a first (30) and a second node (32), a transistor tree (29), a screening transistor (Q20) and clocking circuitry (Q17, Q18, Q19). The transistor tree (29) couples the first (30) and the second (32) node and is operable to electrically short-circuit the nodes according to input signals (A.sub.1, A.sub.2, A.sub.3). The screening transistor (Q20) has a first and a second [source-drain region] current electrode and a [gate] control electrode. The first [source-drain region] current electrode is coupled to a third node (34), the second [source-drain region] current electrode is coupled to the second node (32) and the [gate] control electrode is coupled to the first node (30). The clocking circuitry alternately precharges the first (30) and third nodes (34) to a first known voltage level and evaluates the voltage on the first node (30) to output a logic level.
摘要:
When a request to branch to an address stored in a return memory location (440) occurs, a busy bit is used to determine whether the return memory location (440) contains updated information. When the information is not updated, a predicted address is provided to the prediction verifier (460) by the link stack (410). Once the busy bit is valid, the prediction verifier (460) determines if a proper prediction was made. When an improper prediction was made, the update portion (415) of the link stack (410) based on information from the comparator (425) determines if a value stored in the link stack (410) matches the value stored in the return memory location (440). The link stack (410) is synchronized based upon a favorable comparison indicating the return memory location value matches a value in the link stack. If a match is not found, the predicted address is placed back on the link stack or alternatively the link stack is cleared.
摘要:
A data processor (12) has a branch prediction unit (28) that predicts conditional branch instructions and a control unit (70) therein that monitors the number of unresolved branch instructions. This control unit selectively allows the data processor to fetch the instructions indicated by the branch prediction unit from an external memory system depending upon the number of unresolved branch instructions. The particular threshold number of unresolved branch instructions is user programmable. The data processor thereby limits its bus accesses to those occasions when it is reasonably sure that it will need the indicated instructions.
摘要:
A memory cache (14) has a semi-associative cache array (50), a cache reload buffer (40), and a cache reload buffer driver (42). The memory cache writes received data to the cache reload buffer and waits until the data is requested again before it invalidates any cache aliased entries in the semi-associative cache array. This invalidation step requires no dedicated cycle but instead is a result of the memory cache being able to simultaneously read from the semi-associative cache array and the cache reload buffer.