摘要:
A method and system for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.
摘要:
A method and system for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.
摘要:
A method, system and computer program product for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.
摘要:
A method for implementing sign extension within a multi-precision multiplier is described. The method includes receiving a first input within a multiplier core of the multiplier, receiving a second input within the multiplier core, and creating partial products in the multiplier core using the first and second inputs. The method also includes summing up the partial products in a partial product reduction tree in the multiplier core. The method also includes performing sign extension within the partial product reduction tree of the multiplier core by adding a value to a partial product of the partial product reduction tree. The method further includes computing an output from the partial product reduction tree, the output including a final product of the first and second inputs signed extended to a desired width.
摘要:
A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.
摘要:
A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.
摘要:
A technique for manufacturing a three-dimensional integrated circuit includes stacking a memory unit on a first die that includes a first computational unit. In this case, the memory unit is included in a second die. A second computational unit that is included in a third die is stacked on the second die. Sets of vertical vias that extend through the first, second, and third dies are connected to connect components of the first and second computational units and the memory unit. Multiplexers of the first and second computational units are configured to selectively couple the components to different ones of the sets of vertical vias responsive to respective control words for each of the first and third dies.
摘要:
A method for reducing leakage current within a register file of a processor is disclosed. The register file within the processor is partitioned into at least two power domains, and each of the two power domains can be powered independently. At least one of the two power domains includes at least as many physical registers as there are architected registers defined in an instruction set architecture of the processor. In response to an occurrence of an idle condition within the processor, all architected register file entries are consolidated into one of power domains that will not be powered off, and the power domains that does not contain any architected register file entries after consolidating are powered off. Afterwards, in response to a detection of an end of the idle condition, all of the power domains are powered back on.
摘要:
A technique for manufacturing a three-dimensional integrated circuit includes stacking a memory unit on a first die that includes a first computational unit. In this case, the memory unit is included in a second die. A second computational unit that is included in a third die is stacked on the second die. Sets of vertical vias that extend through the first, second, and third dies are connected to connect components of the first and second computational units and the memory unit. Multiplexers of the first and second computational units are configured to selectively couple the components to different ones of the sets of vertical vias responsive to respective control words for each of the first and third dies.