Abstract:
A method and apparatus for partitioning a cache includes determining an allocation of a subcache out of a plurality of subcaches within the cache for association with a compute unit out of a plurality of compute units. Data is processed by the compute unit, and the compute unit evicts a line. The evicted line is written to the subcache associated with the compute unit.
Abstract:
A method and apparatus for partitioning a cache includes determining an allocation of a subcache out of a plurality of subcaches within the cache for association with a compute unit out of a plurality of compute units. Data is processed by the compute unit, and the compute unit evicts a line. The evicted line is written to the subcache associated with the compute unit.
Abstract:
A method and apparatus to increase the performance of a floating point multiplier accumulator (FMAC). The method comprises receiving three floating point numbers and computing a product of the first floating point number and the second floating point number and adding a third floating point number to produce a sum value and a carry value. A propagate value, a kill value and a generate value are then computed based on the sum value and the carry value. Simultaneously the sum value is added to the carry value to create a first result, the sum value is added to the carry value and incremented by one to create a second result, the sum value is added to the carry value and incremented by two to create a third result, and a decimal point position is determined. One of the first result, the second result and the third result is then selected responsive to a rounding mode and the decimal point position. The selected result is normalized based on the decimal point position. The apparatus comprises a multiplier with a propagate, kill, generate generator (PKG generator) coupled to it. An adder, a plus-oner, a plus-two-er and a leading zero anticipator (LZA) are each coupled to the PKG generator in parallel. A rounding control unit is coupled to the LZA and coupled to a multiplexor that outputs a result from one of the adder, the plus-oner, and the plus-two-er responsive to the rounding control unit. A normalization shifter is coupled to the multiplexor and the LZA.
Abstract:
A circuit includes first and second pull-up transistors having first and second drains, respectively, each coupled to separate voltage clamps. The gates of each of the two pull-up transistors are coupled to a clock signal line. The circuit further includes a shared pull-down transistor, the gate of which is coupled to the clock signal line. The drain of the shared pull-down transistor is coupled to the first drain via at least one pull-down transistor in series with the shared pull-down transistor. The drain of the shared pull-down transistor is also coupled to the second drain via at least one pull-down transistor in series with the shared pull-down transistor. This circuit may be found useful in multiplexing applications.
Abstract:
A broken stack domino priority encoder to provide a set of voltages to uniquely identify the position of a leading one or leading zero in a binary word, the domino priority encoder comprising a by-pass stack of nMOSFETs and a broken stack of nMOSFETs to discharge various nodes. The stack depth of nMOSFETs between each node and ground is minimized in order to maximize switching speed of the priority encoder.