专利检索 ap:("Brett W. Coon" OR "John R. Nickolls" OR "John Erik Lindholm" OR "Robert J. Stoll" OR "Nicholas Wang" OR "Jack Hilaire Choquette" OR "Kathleen Elliott Nickolls") AND inv:"John R. Nickolls" 第 1 页

1.

发明申请
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR 有权
标题翻译：用于并行螺纹加工器的螺纹组合调度器

公开(公告)号：US20120110586A1

公开(公告)日：2012-05-03

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

2.

发明授权
Thread group scheduler for computing on a parallel thread processor 有权
标题翻译：线程组调度程序，用于在并行线程处理器上进行计算

公开(公告)号：US08732713B2

公开(公告)日：2014-05-20

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

3.

发明申请
EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES 有权
标题翻译：对SIMT和SIMD建筑结构的有效实施

公开(公告)号：US20120089792A1

公开(公告)日：2012-04-12

申请号：US13247855

申请日：2011-09-28

申请人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon

发明人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon

IPC分类号： G06F12/00

CPC分类号： G06F9/3885 , G06F9/30036 , G06F9/3009 , G06F9/30123 , G06F9/345 , G06F9/3824 , G06F9/3851 , G06F9/3887 , G06F12/0207 , G06T1/20

摘要： One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

摘要翻译： 本发明的一个实施例提出了一种技术，其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。具体来说，设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。结果是存储器分配和访问方法，其中设备驱动器正确地计算存储器中的目标地址。有利的是，处理效率得到改善，其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度（每个执行组的线程或通道数）成比例的阵列结构的阵列。

4.

发明授权
Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求

公开(公告)号：US07456835B2

公开(公告)日：2008-11-25

申请号：US11339937

申请日：2006-01-25

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40 , G06T15/00 , G06T1/00 , G09G5/00

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

摘要翻译： 图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

5.

发明授权
Structured programming control flow in a SIMD architecture 有权
标题翻译： SIMD架构中的结构化编程控制流程

公开(公告)号：US07877585B1

公开(公告)日：2011-01-25

申请号：US11845429

申请日：2007-08-27

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

IPC分类号： G06F15/76 , G06F7/38 , G06F9/00 , G06F9/44

CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3885 , G06F9/3887

摘要： One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

摘要翻译： 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。禁用掩码允许在多线程SIMD架构中使用条件返回和中断指令。附加控制指令用于设置线程处理目标地址以进行同步，中断和返回。

6.

发明授权
Indirect function call instructions in a synchronous parallel thread processor 有权
标题翻译：同步并行线程处理器中的间接函数调用指令

公开(公告)号：US08312254B2

公开(公告)日：2012-11-13

申请号：US12054255

申请日：2008-03-24

申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

IPC分类号： G06F9/00

CPC分类号： G06F9/38 , G06F9/30054 , G06F9/30101 , G06F9/3851 , G06F9/3885

摘要： An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

摘要翻译： 间接分支指令将地址寄存器作为参数，以便为单指令多线程（SIMT）处理器架构提供间接函数调用能力。间接分支指令用于实现间接函数调用，虚函数调用和switch语句，以提高处理性能，与使用连续的测试和分支链相比。

7.

发明授权
Register based queuing for texture requests 有权

公开(公告)号：US07027062B2

公开(公告)日：2006-04-11

申请号：US10789735

申请日：2004-02-27

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

8.

发明授权
Processing an indirect branch instruction in a SIMD architecture 有权
标题翻译：在SIMD架构中处理间接分支指令

公开(公告)号：US07761697B1

公开(公告)日：2010-07-20

申请号：US11557082

申请日：2006-11-06

申请人： Brett W. Coon , John Erik Lindholm , Peter C. Mills , John R. Nickolls

发明人： Brett W. Coon , John Erik Lindholm , Peter C. Mills , John R. Nickolls

IPC分类号： G06F7/38 , G06F9/00 , G06F9/44

CPC分类号： G06F9/30072 , G06F9/3009 , G06F9/30185 , G06F9/322 , G06F9/3851 , G06F9/3887

摘要： One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.

摘要翻译： 被配置为管理线程组中的发散线程的计算系统的一个实施例包括配置成存储至少一个令牌和多线程处理单元的堆栈。多线程处理单元被配置为执行以下步骤：获取程序指令，确定程序指令是间接分支指令，以及将间接分支指令处理为双向分支序列，以执行具有多个分支的间接分支指令地址可以使用间接分支指令来允许更大的灵活性，因为在编译时不需要确定分支地址或多个分支地址。

9.

发明申请
Indirect Function Call Instructions in a Synchronous Parallel Thread Processor 有权
标题翻译：同步并行线程处理器中的间接函数调用指令

公开(公告)号：US20090240931A1

公开(公告)日：2009-09-24

申请号：US12054255

申请日：2008-03-24

申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

IPC分类号： G06F9/38

CPC分类号： G06F9/38 , G06F9/30054 , G06F9/30101 , G06F9/3851 , G06F9/3885

摘要： An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

摘要翻译： 间接分支指令将地址寄存器作为参数，以便为单指令多线程（SIMT）处理器架构提供间接函数调用能力。间接分支指令用于实现间接函数调用，虚函数调用和switch语句，以提高处理性能，与使用连续的测试和分支链相比。

10.

发明授权
Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求

公开(公告)号：US07864185B1

公开(公告)日：2011-01-04

申请号：US12256848

申请日：2008-10-23

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40 , G06T15/00 , G06T15/20 , G06T1/00

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

摘要翻译： 图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类