Patent search ap:("NVIDIA CORPORATION") AND inv:"David Conrad TANNENBAUM" Page 1

1.

发明申请
MATH PROCESSING BY DETECTION OF ELEMENTARY VALUED OPERANDS 有权
Title translation: 通过检测元数值操作进行数学处理

公开(公告)号：US20150095394A1

公开(公告)日：2015-04-02

申请号：US14040370

申请日：2013-09-27

Applicant: NVIDIA CORPORATION

Inventor： Daniel FINCHELSTEIN , David Conrad TANNENBAUM , Srinivasan (Vasu) IYER

IPC: G06F7/52 , G06F7/50

CPC classification number: G06F7/50 , G06F7/5443 , G06F2207/3884

Abstract: One embodiment of the present invention includes a method for simplifying arithmetic operations by detecting operands with elementary values such as zero or 1.0. Computer and graphics processing systems perform a great number of multiply-add operations. In a significant portion of these operations, the values of one or more of the operands are zero or 1.0. By detecting the occurrence of these elementary values, math operations can be greatly simplified, for example by eliminating multiply operations when one multiplicand is zero or 1.0 or eliminating add operations when one addend is zero. The simplified math operations resulting from detecting elementary valued operands provide significant savings in overhead power, dynamic processing power, and cycle time.

Abstract translation: 本发明的一个实施例包括一种通过检测具有零或1.0等基本值的操作数简化算术运算的方法。计算机和图形处理系统执行大量的多重加法操作。在这些操作的重要部分中，一个或多个操作数的值为零或1.0。通过检测这些基本值的出现，可以大大简化数学运算，例如通过在一个被乘数为零或1.0时消除乘法运算，或者当一个加数为零时消除加法运算。检测基本值操作数导致的简化数学运算能够显着节省架空功耗，动态处理能力和循环时间。

2.

发明申请
TECHNIQUE FOR PERFORMING ARBITRARY WIDTH INTEGER ARITHMETIC OPERATIONS USING FIXED WIDTH ELEMENTS 有权
Title translation: 使用固定宽度元素执行仲裁宽整数算术运算的技术

公开(公告)号：US20150081753A1

公开(公告)日：2015-03-19

申请号：US14026829

申请日：2013-09-13

Applicant: NVIDIA CORPORATION

Inventor： Srinivasan (Vasu) IYER , Michael Alan FETTERMAN , David Conrad TANNENBAUM

IPC: G06F7/57

CPC classification number: G06F7/525 , G06F2207/3824

Abstract: One embodiment of the present invention includes a method for performing arithmetic operations on arbitrary width integers using fixed width elements. The method includes receiving a plurality of input operands, segmenting each input operand into multiple sectors, performing a plurality of multiply-add operations based on the multiple sectors to generate a plurality of multiply-add operation results, and combining the multiply-add operation results to generate a final result. One advantage of the disclosed embodiments is that, by using a common fused floating point multiply-add unit to perform arithmetic operations on integers of arbitrary width, the method avoids the area and power penalty of having additional dedicated integer units.

Abstract translation: 本发明的一个实施例包括使用固定宽度元素对任意宽度整数执行算术运算的方法。该方法包括接收多个输入操作数，将每个输入操作数分割成多个扇区，基于多个扇区执行多个乘法运算，生成多个乘法运算结果，并组合乘法运算结果以产生最终结果。所公开的实施例的一个优点是，通过使用公共融合浮点乘法单元对任意宽度的整数执行算术运算，该方法避免了具有附加专用整数单位的面积和功率损失。

3.

发明申请
EFFICIENCY IN A FUSED FLOATING-POINT MULTIPLY-ADD UNIT 审中-公开
Title translation: 熔融浮点添加单元的效率

公开(公告)号：US20150193203A1

公开(公告)日：2015-07-09

申请号：US14149647

申请日：2014-01-07

Applicant: NVIDIA CORPORATION

Inventor： Srinivasan (Vasu) IYER , David Conrad TANNENBAUM , Stuart F. OBERMAN

IPC: G06F7/544 , G06F7/50

CPC classification number: G06F7/5443 , G06F7/483 , G06F7/5336

Abstract: A four cycle fused floating point multiply-add unit includes a radix 8 Booth encoder multiplier that is partitioned over two stages with the compression element allocated to the second stage. The unit further includes an improved shifter design. Processing logic analyzes the input operands, detects values of zero and one, and inhibits portions of the processing logic accordingly. When one of the multiplicand inputs has a value of zero or one, the required multiplication becomes trivial, and the unit inhibits the associated coding logic and data transfer to reduce power consumption. The unit then performs an add-only operation. When the addend input has a value of zero, the addition becomes trivial, and the unit inhibits the improved shifter and data transfer to further reduce power consumption. The unit then performs a multiply-only operation.

Abstract translation: 四循环融合浮点乘法单元包括一个基数8布斯编码器乘法器，其在压缩元件分配给第二阶段的两个阶段上被划分。该单元还包括改进的换档器设计。处理逻辑分析输入操作数，检测零和一的值，并相应地禁止处理逻辑的部分。当其中一个被乘数输入值为零或1时，所需的乘法变得微不足道，并且该单元禁止相关编码逻辑和数据传输以降低功耗。该单元然后执行加法运算。当加数输入的值为零时，加法变得微不足道，该单元禁止改进的移位器和数据传输，以进一步降低功耗。该单元然后执行多次操作。

4.

发明申请
APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS 审中-公开
Title translation: 有效的算术运算方法

公开(公告)号：US20140129807A1

公开(公告)日：2014-05-08

申请号：US13671485

申请日：2012-11-07

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Ming Y. SIU , Stuart F. OBERMAN , Colin SPRINKLE , Srinivasan IYER , Ian Chi Yan KWONG

IPC: G06F9/302 , G06F9/30

Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.

Abstract translation: 描述了一种系统和方法，用于向处理单元提供后续操作可能的提示。响应地，处理单元采取步骤准备可能的后续操作。在提示更有可能不正确的地方，处理单元更有效地运作。例如，在一个实施例中，处理单元消耗较少的功率。在另一个实施例中，由于处理单元被准备好以有效地处理随后的操作，更快地执行后续操作。

5.

发明申请
EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE 审中-公开
Title translation: 通过分布式指令集架构实现高效

公开(公告)号：US20150113254A1

公开(公告)日：2015-04-23

申请号：US14061666

申请日：2013-10-23

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Srinivasan (Vasu) IYER , Stuart F. OBERMAN , Ming Y. SIU , Michael Alan FETTERMAN , John Matthew BURGESS , Shirish GADRE

IPC: G06F9/38

CPC classification number: G06F9/3836

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Abstract translation: 子系统被配置为支持具有主和辅助执行管线的分布式指令集体系结构。主要执行流水线支持经常发布的分布式指令集架构中的指令子集的执行。辅助执行流水线支持执行分布式指令集体系结构中不太频繁发布的指令的另一子集。两个执行流水线也支持执行FFMA指令以及分布式指令集体系结构中的一个常见的指令子集。当调度所请求的指令时，指令调度单元被配置为基于各种标准在两个执行流水线之间进行选择。这些标准可以包括能够执行指令的功率效率和执行单元的可用性以支持指令的执行。

6.

发明申请
FFMA OPERATIONS USING A MULTI-STEP APPROACH TO DATA SHIFTING 有权
Title translation: FFMA操作使用数据移位的多步法

公开(公告)号：US20150039662A1

公开(公告)日：2015-02-05

申请号：US13959397

申请日：2013-08-05

Applicant: NVIDIA CORPORATION

Inventor： Srinivasan IYER , David Conrad TANNENBAUM , Stuart F. OBERMAN , Ming (Michael) Y. SIU

IPC: G06F5/01

CPC classification number: G06F5/012 , G06F7/483 , G06F7/5443

Abstract: A fused floating-point multiply-add element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.

Abstract translation: 融合浮点乘法元素包括产生乘积的乘法器和用于移动窄范围内的加数的移位器。解释逻辑分析加法相对于产品的大小，然后使逻辑阵列根据相对于产品的加数的大小，将移位的加数定位在复合寄存器的左，中，右部分内。解释逻辑还强制复合寄存器的其他部分为零。当加数为零时，解释逻辑强制复合寄存器的所有部分为零。最终组合逻辑然后将复合寄存器的内容添加到产品中。

7.

发明申请
APPROACH TO POWER REDUCTION IN FLOATING-POINT OPERATIONS 有权
Title translation: 浮动点运行中减少电力的方法

公开(公告)号：US20140143564A1

公开(公告)日：2014-05-22

申请号：US13683362

申请日：2012-11-21

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Colin SPRINKLE , Stuart F. OBERMAN , Ming Y. SIU , Srinivasan IYER , Ian-Chi Yan KWONG

IPC: G06F1/32 , G06F7/483

CPC classification number: G06F1/3234 , G06F1/3237 , G06F1/3243 , G06F1/3287 , G06F7/483 , G06F7/4876 , G06F9/30014 , G06F9/30189 , Y02D10/128 , Y02D10/152 , Y02D10/171

Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.

Abstract translation: 提供了一种用于在浮点运算中实现功率降低的方法。在一个示例中，系统接收融合乘法加法指令的浮点数。系统确定融合乘法加法指令不需要符合浮点数的精度标准。该系统为集成电路产生门控信号，该集成电路被配置为执行融合乘法指令的操作。系统然后将门控信号发送到集成电路以关闭集成电路中包括的多个逻辑门。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification