Patent search ap:("NVIDIA CORPORATION") AND inv:"Ming Y. SIU" Page 1

1.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 审中-公开

公开(公告)号：US20170256022A1

公开(公告)日：2017-09-07

申请号：US15603294

申请日：2017-05-23

Applicant: NVIDIA Corporation

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20 , G06F9/38

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

2.

发明申请
APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS 审中-公开
Title translation: 有效的算术运算方法

公开(公告)号：US20140129807A1

公开(公告)日：2014-05-08

申请号：US13671485

申请日：2012-11-07

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Ming Y. SIU , Stuart F. OBERMAN , Colin SPRINKLE , Srinivasan IYER , Ian Chi Yan KWONG

IPC: G06F9/302 , G06F9/30

Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.

Abstract translation: 描述了一种系统和方法，用于向处理单元提供后续操作可能的提示。响应地，处理单元采取步骤准备可能的后续操作。在提示更有可能不正确的地方，处理单元更有效地运作。例如，在一个实施例中，处理单元消耗较少的功率。在另一个实施例中，由于处理单元被准备好以有效地处理随后的操作，更快地执行后续操作。

3.

发明申请
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
Title translation: 合作螺线减排和扫描作业

公开(公告)号：US20160357560A1

公开(公告)日：2016-12-08

申请号：US15238428

申请日：2016-08-16

Applicant: NVIDIA Corporation

Inventor： Brian FAHS , Ming Y. SIU , Brett W. Coon , John R. NICKOLLS , Lars NYLAND

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Abstract translation: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传递减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

4.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
Title translation: 可编程图形处理程序，用于多方案执行程序

公开(公告)号：US20160300319A9

公开(公告)日：2016-10-13

申请号：US13850175

申请日：2013-03-25

Applicant: NVIDIA CORPORATION

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Abstract translation: 处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

5.

发明申请
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
Title translation: 合作螺线减排和扫描作业

公开(公告)号：US20140019724A1

公开(公告)日：2014-01-16

申请号：US14025482

申请日：2013-09-12

Applicant: NVIDIA Corporation

Inventor： Brian FAHS , Ming Y. SIU , Brett W. COON , John R. NICKOLLS , Lars NYLAND

IPC: G06F9/30

CPC classification number: G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Abstract translation: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

6.

发明公开
EFFICIENTLY LAUNCHING TASKS ON A PROCESSOR 审中-公开

公开(公告)号：US20230236878A1

公开(公告)日：2023-07-27

申请号：US17583957

申请日：2022-01-25

Applicant: NVIDIA CORPORATION

Inventor： Jack Hilaire CHOQUETTE , Rajballav DASH , Shayani DEB , Gentaro HIROTA , Ronny M. KRASHINSKY , Ze LONG , Chen MEI , Manan PATEL , Ming Y. SIU

IPC: G06F9/48

CPC classification number: G06F9/4881

Abstract: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.

7.

发明申请
PRIORITY ENCODER-BASED TECHNIQUES FOR COMPUTING THE MINIMUM OR THE MAXIMUM OF MULTIPLE VALUES 有权

公开(公告)号：US20230100785A1

公开(公告)日：2023-03-30

申请号：US17487813

申请日：2021-09-28

Applicant: NVIDIA CORPORATION

Inventor： Ilyas ELKIN , Brent Ralph BOSWELL , Stuart F. OBERMAN , Ming Y. SIU

IPC: G06F7/60 , G06F9/54

Abstract: In various embodiments, the maximum or minimum of multiple input values is determined. For each of a set of possible values, a corresponding detection result is set to indicate whether at least one of the input values matches the possible value. The detection results are used to ascertain the maximum or minimum of the multiple input values.

8.

发明申请
EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE 审中-公开
Title translation: 通过分布式指令集架构实现高效

公开(公告)号：US20150113254A1

公开(公告)日：2015-04-23

申请号：US14061666

申请日：2013-10-23

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Srinivasan (Vasu) IYER , Stuart F. OBERMAN , Ming Y. SIU , Michael Alan FETTERMAN , John Matthew BURGESS , Shirish GADRE

IPC: G06F9/38

CPC classification number: G06F9/3836

Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

Abstract translation: 子系统被配置为支持具有主和辅助执行管线的分布式指令集体系结构。主要执行流水线支持经常发布的分布式指令集架构中的指令子集的执行。辅助执行流水线支持执行分布式指令集体系结构中不太频繁发布的指令的另一子集。两个执行流水线也支持执行FFMA指令以及分布式指令集体系结构中的一个常见的指令子集。当调度所请求的指令时，指令调度单元被配置为基于各种标准在两个执行流水线之间进行选择。这些标准可以包括能够执行指令的功率效率和执行单元的可用性以支持指令的执行。

9.

发明申请
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
Title translation: 可编程图形处理程序，用于多方案执行程序

公开(公告)号：US20140285500A1

公开(公告)日：2014-09-25

申请号：US13850175

申请日：2013-03-25

Applicant: NVIDIA CORPORATION

Inventor： John Erik LINDHOLM , Brett W. COON , Stuart F. OBERMAN , Ming Y. SIU , Matthew P. GERLACH

IPC: G06T1/20

CPC classification number: G06T1/20 , G06F9/38 , G06F9/3851

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Abstract translation: 处理单元包括多个执行流水线，每个执行流水线连接到第一输入部分，用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和用于存储经处理的顶点数据的第二输出部分。经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。经处理的像素数据被输出到光栅分析器。

10.

发明申请
APPROACH TO POWER REDUCTION IN FLOATING-POINT OPERATIONS 有权
Title translation: 浮动点运行中减少电力的方法

公开(公告)号：US20140143564A1

公开(公告)日：2014-05-22

申请号：US13683362

申请日：2012-11-21

Applicant: NVIDIA CORPORATION

Inventor： David Conrad TANNENBAUM , Colin SPRINKLE , Stuart F. OBERMAN , Ming Y. SIU , Srinivasan IYER , Ian-Chi Yan KWONG

IPC: G06F1/32 , G06F7/483

CPC classification number: G06F1/3234 , G06F1/3237 , G06F1/3243 , G06F1/3287 , G06F7/483 , G06F7/4876 , G06F9/30014 , G06F9/30189 , Y02D10/128 , Y02D10/152 , Y02D10/171

Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.

Abstract translation: 提供了一种用于在浮点运算中实现功率降低的方法。在一个示例中，系统接收融合乘法加法指令的浮点数。系统确定融合乘法加法指令不需要符合浮点数的精度标准。该系统为集成电路产生门控信号，该集成电路被配置为执行融合乘法指令的操作。系统然后将门控信号发送到集成电路以关闭集成电路中包括的多个逻辑门。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification