Abstract:
A method and apparatus to increase the performance of a floating point multiplier accumulator (FMAC). The method comprises receiving three floating point numbers and computing a product of the first floating point number and the second floating point number and adding a third floating point number to produce a sum value and a carry value. A propagate value, a kill value and a generate value are then computed based on the sum value and the carry value. Simultaneously the sum value is added to the carry value to create a first result, the sum value is added to the carry value and incremented by one to create a second result, the sum value is added to the carry value and incremented by two to create a third result, and a decimal point position is determined. One of the first result, the second result and the third result is then selected responsive to a rounding mode and the decimal point position. The selected result is normalized based on the decimal point position. The apparatus comprises a multiplier with a propagate, kill, generate generator (PKG generator) coupled to it. An adder, a plus-oner, a plus-two-er and a leading zero anticipator (LZA) are each coupled to the PKG generator in parallel. A rounding control unit is coupled to the LZA and coupled to a multiplexor that outputs a result from one of the adder, the plus-oner, and the plus-two-er responsive to the rounding control unit. A normalization shifter is coupled to the multiplexor and the LZA.
Abstract:
A circuit includes first and second pull-up transistors having first and second drains, respectively, each coupled to separate voltage clamps. The gates of each of the two pull-up transistors are coupled to a clock signal line. The circuit further includes a shared pull-down transistor, the gate of which is coupled to the clock signal line. The drain of the shared pull-down transistor is coupled to the first drain via at least one pull-down transistor in series with the shared pull-down transistor. The drain of the shared pull-down transistor is also coupled to the second drain via at least one pull-down transistor in series with the shared pull-down transistor. This circuit may be found useful in multiplexing applications.
Abstract:
A broken stack domino priority encoder to provide a set of voltages to uniquely identify the position of a leading one or leading zero in a binary word, the domino priority encoder comprising a by-pass stack of nMOSFETs and a broken stack of nMOSFETs to discharge various nodes. The stack depth of nMOSFETs between each node and ground is minimized in order to maximize switching speed of the priority encoder.
Abstract:
A method and apparatus for partitioning a cache includes determining an allocation of a subcache out of a plurality of subcaches within the cache for association with a compute unit out of a plurality of compute units. Data is processed by the compute unit, and the compute unit evicts a line. The evicted line is written to the subcache associated with the compute unit.
Abstract:
A method and apparatus for partitioning a cache includes determining an allocation of a subcache out of a plurality of subcaches within the cache for association with a compute unit out of a plurality of compute units. Data is processed by the compute unit, and the compute unit evicts a line. The evicted line is written to the subcache associated with the compute unit.