Patent search ap:("NVIDIA Corporation") AND inv:"Stuart F. OBERMAN" Page 1

1.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
Title translation: 可编程图形处理程序，用于多方案执行程序

公开(公告)号：US20160300319A9

公开(公告)日：2016-10-13

申请号：US13850175

申请日：2013-03-25

Applicant: NVIDIA CORPORATION

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Abstract translation: 处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

2.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 审中-公开

公开(公告)号：US20170256022A1

公开(公告)日：2017-09-07

申请号：US15603294

申请日：2017-05-23

Applicant: NVIDIA Corporation

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20 , G06F9/38

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

3.

发明申请
EFFICIENCY IN A FUSED FLOATING-POINT MULTIPLY-ADD UNIT 审中-公开
Title translation: 熔融浮点添加单元的效率

公开(公告)号：US20150193203A1

公开(公告)日：2015-07-09

申请号：US14149647

申请日：2014-01-07

Applicant: NVIDIA CORPORATION

Inventor： Srinivasan (Vasu) IYER , David Conrad TANNENBAUM , Stuart F. OBERMAN

IPC: G06F7/544 , G06F7/50

CPC classification number: G06F7/5443 , G06F7/483 , G06F7/5336

Abstract: A four cycle fused floating point multiply-add unit includes a radix 8 Booth encoder multiplier that is partitioned over two stages with the compression element allocated to the second stage. The unit further includes an improved shifter design. Processing logic analyzes the input operands, detects values of zero and one, and inhibits portions of the processing logic accordingly. When one of the multiplicand inputs has a value of zero or one, the required multiplication becomes trivial, and the unit inhibits the associated coding logic and data transfer to reduce power consumption. The unit then performs an add-only operation. When the addend input has a value of zero, the addition becomes trivial, and the unit inhibits the improved shifter and data transfer to further reduce power consumption. The unit then performs a multiply-only operation.

Abstract translation: 四循环融合浮点乘法单元包括一个基数8布斯编码器乘法器，其在压缩元件分配给第二阶段的两个阶段上被划分。该单元还包括改进的换档器设计。处理逻辑分析输入操作数，检测零和一的值，并相应地禁止处理逻辑的部分。当其中一个被乘数输入值为零或1时，所需的乘法变得微不足道，并且该单元禁止相关编码逻辑和数据传输以降低功耗。该单元然后执行加法运算。当加数输入的值为零时，加法变得微不足道，该单元禁止改进的移位器和数据传输，以进一步降低功耗。该单元然后执行多次操作。

4.

发明申请
APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS 审中-公开
Title translation: 有效的算术运算方法

公开(公告)号：US20140129807A1

公开(公告)日：2014-05-08

申请号：US13671485

申请日：2012-11-07

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Ming Y. SIU , Stuart F. OBERMAN , Colin SPRINKLE , Srinivasan IYER , Ian Chi Yan KWONG

IPC: G06F9/302 , G06F9/30

Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.

Abstract translation: 描述了一种系统和方法，用于向处理单元提供后续操作可能的提示。响应地，处理单元采取步骤准备可能的后续操作。在提示更有可能不正确的地方，处理单元更有效地运作。例如，在一个实施例中，处理单元消耗较少的功率。在另一个实施例中，由于处理单元被准备好以有效地处理随后的操作，更快地执行后续操作。

5.

发明申请
PRIORITY ENCODER-BASED TECHNIQUES FOR COMPUTING THE MINIMUM OR THE MAXIMUM OF MULTIPLE VALUES 有权

公开(公告)号：US20230100785A1

公开(公告)日：2023-03-30

申请号：US17487813

申请日：2021-09-28

Applicant: NVIDIA CORPORATION

Inventor： Ilyas ELKIN , Brent Ralph BOSWELL , Stuart F. OBERMAN , Ming Y. SIU

IPC: G06F7/60 , G06F9/54

Abstract: In various embodiments, the maximum or minimum of multiple input values is determined. For each of a set of possible values, a corresponding detection result is set to indicate whether at least one of the input values matches the possible value. The detection results are used to ascertain the maximum or minimum of the multiple input values.

6.

发明申请
EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE 审中-公开
Title translation: 通过分布式指令集架构实现高效

公开(公告)号：US20150113254A1

公开(公告)日：2015-04-23

申请号：US14061666

申请日：2013-10-23

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Srinivasan (Vasu) IYER , Stuart F. OBERMAN , Ming Y. SIU , Michael Alan FETTERMAN , John Matthew BURGESS , Shirish GADRE

IPC: G06F9/38

CPC classification number: G06F9/3836

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Abstract translation: 子系统被配置为支持具有主和辅助执行管线的分布式指令集体系结构。主要执行流水线支持经常发布的分布式指令集架构中的指令子集的执行。辅助执行流水线支持执行分布式指令集体系结构中不太频繁发布的指令的另一子集。两个执行流水线也支持执行FFMA指令以及分布式指令集体系结构中的一个常见的指令子集。当调度所请求的指令时，指令调度单元被配置为基于各种标准在两个执行流水线之间进行选择。这些标准可以包括能够执行指令的功率效率和执行单元的可用性以支持指令的执行。

7.

发明申请
FFMA OPERATIONS USING A MULTI-STEP APPROACH TO DATA SHIFTING 有权
Title translation: FFMA操作使用数据移位的多步法

公开(公告)号：US20150039662A1

公开(公告)日：2015-02-05

申请号：US13959397

申请日：2013-08-05

Applicant: NVIDIA CORPORATION

Inventor： Srinivasan IYER , David Conrad TANNENBAUM , Stuart F. OBERMAN , Ming (Michael) Y. SIU

IPC: G06F5/01

CPC classification number: G06F5/012 , G06F7/483 , G06F7/5443

Abstract: A fused floating-point multiply-add element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.

Abstract translation: 融合浮点乘法元素包括产生乘积的乘法器和用于移动窄范围内的加数的移位器。解释逻辑分析加法相对于产品的大小，然后使逻辑阵列根据相对于产品的加数的大小，将移位的加数定位在复合寄存器的左，中，右部分内。解释逻辑还强制复合寄存器的其他部分为零。当加数为零时，解释逻辑强制复合寄存器的所有部分为零。最终组合逻辑然后将复合寄存器的内容添加到产品中。

8.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
Title translation: 可编程图形处理程序，用于多方案执行程序

公开(公告)号：US20140285500A1

公开(公告)日：2014-09-25

申请号：US13850175

申请日：2013-03-25

Applicant: NVIDIA CORPORATION

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Abstract translation: 处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

9.

发明申请
APPROACH TO POWER REDUCTION IN FLOATING-POINT OPERATIONS 有权
Title translation: 浮动点运行中减少电力的方法

公开(公告)号：US20140143564A1

公开(公告)日：2014-05-22

申请号：US13683362

申请日：2012-11-21

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Colin SPRINKLE , Stuart F. OBERMAN , Ming Y. SIU , Srinivasan IYER , Ian-Chi Yan KWONG

IPC: G06F1/32 , G06F7/483

CPC classification number: G06F1/3234 , G06F1/3237 , G06F1/3243 , G06F1/3287 , G06F7/483 , G06F7/4876 , G06F9/30014 , G06F9/30189 , Y02D10/128 , Y02D10/152 , Y02D10/171

Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.

Abstract translation: 提供了一种用于在浮点运算中实现功率降低的方法。在一个示例中，系统接收融合乘法加法指令的浮点数。系统确定融合乘法加法指令不需要符合浮点数的精度标准。该系统为集成电路产生门控信号，该集成电路被配置为执行融合乘法指令的操作。系统然后将门控信号发送到集成电路以关闭集成电路中包括的多个逻辑门。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification