专利检索 ap:("Roger L. Allen" OR "Brett W. Coon" OR "Ian A. Buck" OR "John R. Nickolls") AND inv:"John R. Nickolls" 第 2 页

11.

发明授权
Shared single-access memory with management of multiple parallel requests 有权
标题翻译：具有管理多个并行请求的共享单访问存储器

公开(公告)号：US08176265B2

公开(公告)日：2012-05-08

申请号：US13165638

申请日：2011-06-21

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

12.

发明授权
Shared memory with parallel access and access conflict resolution mechanism 有权
标题翻译：共享内存具有并行访问和访问冲突解决机制

公开(公告)号：US08108625B1

公开(公告)日：2012-01-31

申请号：US11554546

申请日：2006-10-30

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F13/1663

摘要： Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.

摘要翻译： 多线程处理器中的并发线程共享对内存的访问，任何线程都可以访问共享内存中的任何位置。在一个实施例中，共享存储器具有多个可独立寻址的存储体，并且可以并行地访问每个存储体的一个位置。执行线程的并行处理引擎生成一组并行内存访问请求。地址冲突逻辑确定请求是否可以并行满足（例如，基于银行访问约束），并将请求序列化到避免冲突所需的程度。在一些实施例中，从共享存储器中的一个地址读取的数据可以广播到多个处理引擎。

13.

发明授权
Structured programming control flow in a SIMD architecture 有权
标题翻译： SIMD架构中的结构化编程控制流程

公开(公告)号：US07877585B1

公开(公告)日：2011-01-25

申请号：US11845429

申请日：2007-08-27

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

IPC分类号： G06F15/76 , G06F7/38 , G06F9/00 , G06F9/44

CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3885 , G06F9/3887

摘要： One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

摘要翻译： 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。禁用掩码允许在多线程SIMD架构中使用条件返回和中断指令。附加控制指令用于设置线程处理目标地址以进行同步，中断和返回。

14.

发明授权
Cache operations and policies for a multi-threaded client 有权

公开(公告)号：US09952977B2

公开(公告)日：2018-04-24

申请号：US12890476

申请日：2010-09-24

申请人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

发明人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

IPC分类号： G06F12/00 , G06F12/0842 , G06F12/0897

CPC分类号： G06F12/0842 , G06F12/0897

摘要： A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

15.

发明授权
Cooperative thread array reduction and scan operations 有权
标题翻译：合作线程数组减少和扫描操作

公开(公告)号：US08539204B2

公开(公告)日：2013-09-17

申请号：US12890227

申请日：2010-09-24

申请人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

发明人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

IPC分类号： G06F9/30 , G06F9/40 , G06F15/00

CPC分类号： G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

摘要翻译： 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

16.

发明授权
Trap handler architecture for a parallel processing unit 有权
标题翻译：并行处理单元的陷阱处理器架构

公开(公告)号：US08522000B2

公开(公告)日：2013-08-27

申请号：US12569831

申请日：2009-09-29

申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

IPC分类号： G06F9/00

CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861

摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

摘要翻译： 陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

17.

发明授权
Lock mechanism to enable atomic updates to shared memory 有权
标题翻译：锁定机制，以实现对共享内存的原子更新

公开(公告)号：US08375176B2

公开(公告)日：2013-02-12

申请号：US13276224

申请日：2011-10-18

申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills

发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , G06F9/3004 , G06F9/30087 , G06F9/30185 , G06F9/526 , G06F2209/521

摘要： A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

摘要翻译： 用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。读取数据将返回给具有锁定状态的请求者。在写入期间在读取或解锁期间锁定时，锁定状态可能会同时更改。因此，在读取 - 修改 - 写入操作之前或期间，不必将锁定状态检查为单独的事务。另外，可以为每个原子存储器操作明确地指定锁定或解锁。因此，对于不修改内存位置的内容的操作，不执行锁定操作。

18.

发明授权
Indirect function call instructions in a synchronous parallel thread processor 有权
标题翻译：同步并行线程处理器中的间接函数调用指令

公开(公告)号：US08312254B2

公开(公告)日：2012-11-13

申请号：US12054255

申请日：2008-03-24

申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills , John Erik Lindholm

IPC分类号： G06F9/00

CPC分类号： G06F9/38 , G06F9/30054 , G06F9/30101 , G06F9/3851 , G06F9/3885

摘要： An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

摘要翻译： 间接分支指令将地址寄存器作为参数，以便为单指令多线程（SIMT）处理器架构提供间接函数调用能力。间接分支指令用于实现间接函数调用，虚函数调用和switch语句，以提高处理性能，与使用连续的测试和分支链相比。

19.

发明授权
Lock mechanism to enable atomic updates to shared memory 有权
标题翻译：锁定机制，以实现对共享内存的原子更新

公开(公告)号：US08055856B2

公开(公告)日：2011-11-08

申请号：US12054267

申请日：2008-03-24

申请人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills

发明人： Brett W. Coon , John R. Nickolls , Lars Nyland , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , G06F9/3004 , G06F9/30087 , G06F9/30185 , G06F9/526 , G06F2209/521

摘要： A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

摘要翻译： 用于锁定和解锁对原子操作的共享存储器的访问的系统和方法提供指示锁是否成功的即时反馈。读取数据将返回给具有锁定状态的请求者。在写入期间在读取或解锁期间锁定时，锁定状态可能会同时更改。因此，在读取 - 修改 - 写入操作之前或期间，不必将锁定状态检查为单独的事务。另外，可以为每个原子存储器操作明确地指定锁定或解锁。因此，对于不修改内存位置的内容的操作，不执行锁定操作。

20.

发明申请
TRAP HANDLER ARCHITECTURE FOR A PARALLEL PROCESSING UNIT 有权
标题翻译：并行处理单元的TRAP操作架构

公开(公告)号：US20110078427A1

公开(公告)日：2011-03-31

申请号：US12569831

申请日：2009-09-29

申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

IPC分类号： G06F9/38

CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861

摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

摘要翻译： 陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类