专利检索 ap:("Steven James Heinrich" OR "Alexander L. Minkin" OR "Brett W. Coon" OR "Rajeshwaran Selvanesan" OR "Robert Steven Glanville" OR "Charles McCarver" OR "Anjana Rajendran" OR "Stewart Glenn Carlton" OR "John R. Nickolls" OR "Brian Fahs") AND inv:"John R. Nickolls" 第 2 页

11.

发明申请
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR 有权
标题翻译：用于并行螺纹加工器的螺纹组合调度器

公开(公告)号：US20120110586A1

公开(公告)日：2012-05-03

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

12.

发明授权
Generating event signals for performance register control using non-operative instructions 有权
标题翻译：使用非操作指令生成用于性能寄存器控制的事件信号

公开(公告)号：US07809928B1

公开(公告)日：2010-10-05

申请号：US11313872

申请日：2005-12-20

申请人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls

发明人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls

IPC分类号： G06F9/30 , G06F17/00 , G09G5/02

CPC分类号： G06T1/20 , G06F9/30072 , G06F9/30076 , G06F11/3466 , G06F2201/86 , G06F2201/865 , G06F2201/88

摘要： One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.

摘要翻译： 指令解码器的一个实施例包括：指令解析器，被配置为处理第一非操作指令并产生对应于第一非操作指令的第一事件信号;以及第一事件多路复用器，被配置为从指令接收第一事件信号解析器，以从一个或多个事件信号中选择第一事件信号，并将第一事件信号发送到事件逻辑块。指令解码器可以在诸如着色器单元的多线程处理单元中实现，并且当在处理单元内执行一个或多个线程时，可以跟踪第一事件信号的出现。所得到的事件信号计数可以使设计者更好地理解在处理单元内执行的诸如着色器程序之类的程序的行为，从而有助于整体处理单元和程序设计。

13.

发明授权
Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求

公开(公告)号：US07456835B2

公开(公告)日：2008-11-25

申请号：US11339937

申请日：2006-01-25

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40 , G06T15/00 , G06T1/00 , G09G5/00

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

摘要翻译： 图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

14.

发明授权
Instructions for managing a parallel cache hierarchy 有权

公开(公告)号：US09639479B2

公开(公告)日：2017-05-02

申请号：US12888409

申请日：2010-09-22

申请人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow

发明人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow

IPC分类号： G06F12/121 , G06F12/0811 , G06F12/0862 , G06F9/30

CPC分类号： G06F9/3887 , G06F9/30043 , G06F9/3009 , G06F9/3836 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/121 , G06F2212/452

摘要： A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

15.

发明申请
SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
标题翻译：具有多个并行请求管理的共享单访存储器

公开(公告)号：US20120221808A1

公开(公告)日：2012-08-30

申请号：US13466057

申请日：2012-05-07

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

16.

发明授权
Shared single-access memory with management of multiple parallel requests 有权
标题翻译：具有管理多个并行请求的共享单访问存储器

公开(公告)号：US08176265B2

公开(公告)日：2012-05-08

申请号：US13165638

申请日：2011-06-21

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

17.

发明授权
Shared memory with parallel access and access conflict resolution mechanism 有权
标题翻译：共享内存具有并行访问和访问冲突解决机制

公开(公告)号：US08108625B1

公开(公告)日：2012-01-31

申请号：US11554546

申请日：2006-10-30

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F13/1663

摘要： Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.

摘要翻译： 多线程处理器中的并发线程共享对内存的访问，任何线程都可以访问共享内存中的任何位置。在一个实施例中，共享存储器具有多个可独立寻址的存储体，并且可以并行地访问每个存储体的一个位置。执行线程的并行处理引擎生成一组并行内存访问请求。地址冲突逻辑确定请求是否可以并行满足（例如，基于银行访问约束），并将请求序列化到避免冲突所需的程度。在一些实施例中，从共享存储器中的一个地址读取的数据可以广播到多个处理引擎。

18.

发明授权
Structured programming control flow in a SIMD architecture 有权
标题翻译： SIMD架构中的结构化编程控制流程

公开(公告)号：US07877585B1

公开(公告)日：2011-01-25

申请号：US11845429

申请日：2007-08-27

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Svetoslav D. Tzvetkov

IPC分类号： G06F15/76 , G06F7/38 , G06F9/00 , G06F9/44

CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3885 , G06F9/3887

摘要： One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

摘要翻译： 被配置为管理SIMD线程组中的发散线程的计算系统的一个实施例包括被配置为存储用于处理控制指令的状态信息的堆栈。并行处理单元被配置为执行在执行条件控制指令期间确定一个或多个线程是否发散的步骤。禁用掩码允许在多线程SIMD架构中使用条件返回和中断指令。附加控制指令用于设置线程处理目标地址以进行同步，中断和返回。

19.

发明授权
Apparatus and method for monitoring and debugging a graphics processing unit 有权
标题翻译：用于监视和调试图形处理单元的装置和方法

公开(公告)号：US07600155B1

公开(公告)日：2009-10-06

申请号：US11302950

申请日：2005-12-13

申请人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller

发明人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller

IPC分类号： G06F11/00

CPC分类号： G06F11/36

摘要： A system has a graphics processing unit with a processor to monitor selected criteria and circuitry to initiate the storage of execution state information when the selected criteria reaches a specified state. A memory stores execution state information. A central processing unit executes a debugging program to analyze the execution state information.

摘要翻译： 系统具有图形处理单元，其具有处理器以监视所选标准和电路，以在所选标准达到指定状态时启动执行状态信息的存储。内存存储执行状态信息。中央处理单元执行调试程序以分析执行状态信息。

20.

发明授权
Trap handler architecture for a parallel processing unit 有权
标题翻译：并行处理单元的陷阱处理器架构

公开(公告)号：US08522000B2

公开(公告)日：2013-08-27

申请号：US12569831

申请日：2009-09-29

申请人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

发明人： Michael C. Shebanow , Jack Choquette , Brett W. Coon , Steven J. Heinrich , Aravind Kalaiah , John R. Nickolls , Daniel Salinas , Ming Y. Siu , Tommy Thorn , Nicholas Wang

IPC分类号： G06F9/00

CPC分类号： G06F9/327 , G06F9/3851 , G06F9/3861

摘要： A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

摘要翻译： 陷阱处理器架构被并入到诸如GPU的并行处理子系统中。陷阱处理器架构通过强加与流式多处理器相关联的所有线程组都在其各自的代码段内执行或全部在陷阱处理程序代码段内执行的属性来最小化并发执行线程的设计复杂性和验证工作。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类