专利检索 ap:("Won S. Kim" OR "David M. Bulfer" OR "John R. Nickolls" OR "W. Thomas Blank" OR "Hannes Figel") AND inv:"John R. Nickolls" 第 2 页

11.

发明申请
SYSTEMS AND METHODS FOR COALESCING MEMORY ACCESSES OF PARALLEL THREADS 有权
标题翻译：用于并行线程的存储器访问的系统和方法

公开(公告)号：US20090240895A1

公开(公告)日：2009-09-24

申请号：US12054330

申请日：2008-03-24

申请人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal

发明人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal

IPC分类号： G06F12/00

CPC分类号： G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3891

摘要： One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data. When all the memory access requests associated with a particular PRT entry are complete, the core interface satisfies the corresponding application request and frees the PRT entry.

摘要翻译： 本发明的一个实施例提出了一种用于有效且灵活地执行线程组合的存储器访问的技术。对于为线程组服务的每个读取应用程序请求，核心接口生成一个未决请求表（PRT）条目和一个或多个内存访问请求。核心接口基于应用程序请求中的存储器访问地址的扩展来确定存储器访问请求的数量和每个存储器访问请求的大小。每个存储器访问请求指定存储器访问请求服务的特定线程。 PRT条目跟踪挂起的内存访问请求的数量。当存储器接口完成每个存储器访问请求时，核心接口使用存储器访问请求中的信息和对应的PRT条目来路由返回的数据。当与特定PRT条目相关联的所有存储器访问请求完成时，核心接口满足相应的应用请求并释放PRT条目。

12.

发明授权
Parallel data processing systems and methods using cooperative thread arrays and SIMD instruction issue 有权
标题翻译：并行数据处理系统和方法使用协作线程数组和SIMD指令问题

公开(公告)号：US07584342B1

公开(公告)日：2009-09-01

申请号：US11305479

申请日：2005-12-15

申请人： Bryon S. Nordquist , John R. Nickolls , Luis I. Bacayo

发明人： Bryon S. Nordquist , John R. Nickolls , Luis I. Bacayo

IPC分类号： G06F15/80

CPC分类号： G06F9/52 , G06F9/3851 , G06F9/3887

摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time and that controls various aspects of the thread's processing behavior, such as the portion of the input data set to be processed by each thread, the portion of the output data set to be produced by each thread, and/or sharing of intermediate results among threads. Where groups of threads are executed in SIMD parallelism, thread IDs for threads in the same SIMD group are generated and assigned in parallel, allowing different SIMD groups to be launched in rapid succession.

摘要翻译： 并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配，并控制线程处理行为的各个方面，例如每个线程处理的输入数据集的部分，部分由每个线程产生的输出数据集合和/或线程之间的中间结果的共享。在SIMD并行执行线程组的情况下，并行生成并分配同一SIMD组中的线程的线程ID，从而允许快速连续启动不同的SIMD组。

13.

发明授权
Defect tolerant redundancy 有权
标题翻译：缺陷容错冗余

公开(公告)号：US07477091B2

公开(公告)日：2009-01-13

申请号：US11105326

申请日：2005-04-12

申请人： John R. Nickolls

发明人： John R. Nickolls

IPC分类号： G06F11/16

CPC分类号： G11C29/848

摘要： Circuits, methods, and apparatus for using redundant circuitry on integrated circuits in order to increase manufacturing yields. One exemplary embodiment of the present invention provides a circuit configuration wherein functional circuit blocks in a group of circuit blocks are selected by multiplexers. Multiplexers at the input and output of the group of circuit blocks steer input and output signals to and from functional circuit blocks, avoiding circuit blocks found to be defective or nonfunctional. Multiple groups of these circuit blocks may be arranged in series and in parallel. Alternate multiplexer configurations may be used in order to provide a higher level of redundancy. Other embodiments use all functional circuit blocks and sort integrated circuits based on the level of functionality or performance. Other embodiments provide methods of testing integrated circuits having one or more of these circuit configurations.

摘要翻译： 用于在集成电路上使用冗余电路的电路，方法和装置，以增加制造产量。本发明的一个示例性实施例提供一种电路配置，其中一组电路块中的功能电路块由多路复用器选择。电路组输入和输出的多路复用器将输入和输出信号转换到功能电路块和从功能电路块输出，避免电路块发现有故障或无功能。这些电路块的多组可以串联和并联布置。可以使用替代多路复用器配置以提供更高级别的冗余。其他实施例使用所有功能电路块并且基于功能或性能的级别对集成电路进行分类。其他实施例提供了测试具有这些电路配置中的一个或多个的集成电路的方法。

14.

发明授权
Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求

公开(公告)号：US07456835B2

公开(公告)日：2008-11-25

申请号：US11339937

申请日：2006-01-25

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40 , G06T15/00 , G06T1/00 , G09G5/00

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

摘要翻译： 图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

15.

发明授权
Galois field arithmetic unit for use within a processor 有权
标题翻译：用于处理器内的伽罗瓦域算术单元

公开(公告)号：US07313583B2

公开(公告)日：2007-12-25

申请号：US10460599

申请日：2003-06-12

申请人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

发明人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

IPC分类号： G06F15/00 , H03M13/00

CPC分类号： G06F7/724

摘要： A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.

摘要翻译： 伽罗瓦域算术单元包括伽罗瓦域乘法器部分和伽罗瓦域加法器部分。伽罗瓦域乘法器部分包括多个伽罗瓦域乘法器阵列，其通过根据生成多项式乘以第1和第2操作数和第2和/ >操作数。 1 ^{和 nd / / SUP>操作数的位大小对应于处理器数据路径的位大小，其中Galois域乘法器阵列中的每一个执行Galois的一部分根据生成多项式的对应部分乘以1＆lt; S＆gt;和2＆lt; nd＆gt;操作数的对应部分进行场乘法运算。第1和第2和第2操作数的对应部分的位大小对应于由对应的处理器实现的编码方案的符号的符号大小。}

16.

发明授权
Instructions for managing a parallel cache hierarchy 有权

公开(公告)号：US09639479B2

公开(公告)日：2017-05-02

申请号：US12888409

申请日：2010-09-22

申请人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow

发明人： John R. Nickolls , Brett W. Coon , Michael C. Shebanow

IPC分类号： G06F12/121 , G06F12/0811 , G06F12/0862 , G06F9/30

CPC分类号： G06F9/3887 , G06F9/30043 , G06F9/3009 , G06F9/3836 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/121 , G06F2212/452

摘要： A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

17.

发明授权
Sharing data crossbar for reads and writes in a data cache 有权
标题翻译：在数据高速缓存中共享用于读写数据的交叉开关

公开(公告)号：US09286256B2

公开(公告)日：2016-03-15

申请号：US12892862

申请日：2010-09-28

申请人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

发明人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

IPC分类号： G06F12/00 , G06F13/40 , G06F13/00 , G06F13/28

CPC分类号： G06F13/4022 , G06F13/4031

摘要： The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

摘要翻译： 本发明提出了一种L1缓存架构，其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。类似地，与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。为了允许在交叉开关单元上传输读取和写入数据，仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

18.

发明申请
SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS 审中-公开
标题翻译：用于表示并行线程的系统和方法

公开(公告)号：US20120239909A1

公开(公告)日：2012-09-20

申请号：US13485622

申请日：2012-05-31

申请人： John R. Nickolls , Lars Nyland , Peter C. Mills , Jeremy Sugerman , Timothy Foley , Brian Fahs , Michael Garland , David P. Luebke

发明人： John R. Nickolls , Lars Nyland , Peter C. Mills , Jeremy Sugerman , Timothy Foley , Brian Fahs , Michael Garland , David P. Luebke

IPC分类号： G06F9/00

CPC分类号： G06F9/3851 , G06F9/30087 , G06F9/3009 , G06F9/3887

摘要： One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

摘要翻译： 本发明的一个实施例提出了一种用于在多线程并行处理系统内有效执行投票操作的技术。一组相关的并行程序线程并行执行在处理器内核中。引入了一项称为“投票”指令的新指令，使得并行程序线程能够在相关线程组的上下文中发布个人投票并接收投票结果。以这种方式，投票指令有利地减少与线程间通信相关联的开销，从而提高整体系统性能。

19.

发明申请
SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
标题翻译：具有多个并行请求管理的共享单访存储器

公开(公告)号：US20120221808A1

公开(公告)日：2012-08-30

申请号：US13466057

申请日：2012-05-07

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

20.

发明授权
Shared single-access memory with management of multiple parallel requests 有权
标题翻译：具有管理多个并行请求的共享单访问存储器

公开(公告)号：US08176265B2

公开(公告)日：2012-05-08

申请号：US13165638

申请日：2011-06-21

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/00

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类