专利检索 ap:("Joshua Porten" OR "Won Kim" OR "Scott D. Johnson" OR "John R. Nickolls") AND inv:"John R. Nickolls" 第 1 页

1.

发明授权
Galois field arithmetic unit for use within a processor 有权
标题翻译：用于处理器内的伽罗瓦域算术单元

公开(公告)号：US07313583B2

公开(公告)日：2007-12-25

申请号：US10460599

申请日：2003-06-12

申请人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

发明人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

IPC分类号： G06F15/00 , H03M13/00

CPC分类号： G06F7/724

摘要： A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.

摘要翻译： 伽罗瓦域算术单元包括伽罗瓦域乘法器部分和伽罗瓦域加法器部分。伽罗瓦域乘法器部分包括多个伽罗瓦域乘法器阵列，其通过根据生成多项式乘以第1和第2操作数和第2和/ >操作数。 1 ^{和 nd / / SUP>操作数的位大小对应于处理器数据路径的位大小，其中Galois域乘法器阵列中的每一个执行Galois的一部分根据生成多项式的对应部分乘以1＆lt; S＆gt;和2＆lt; nd＆gt;操作数的对应部分进行场乘法运算。第1和第2和第2操作数的对应部分的位大小对应于由对应的处理器实现的编码方案的符号的符号大小。}

2.

发明授权
Galois field multiplier array for use within a finite field arithmetic unit 有权
标题翻译：用于有限域运算单元内的伽罗瓦域乘法器阵列

公开(公告)号：US07403964B2

公开(公告)日：2008-07-22

申请号：US10459988

申请日：2003-06-12

申请人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

发明人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

IPC分类号： G06F15/00 , H03M13/00

CPC分类号： G06F7/724

摘要： A Galois field multiplier array includes a 1st register, a 2nd register, a 3rd register, and a plurality of multiplier cells. The 1st register stores bits of a 1st operand. The 2nd register stores bits of a 2nd operand. The 3rd register stores bits of a generating polynomial that corresponds to one of a plurality of applications (e.g., FEC, CRC, Reed Solomon, et cetera). The plurality of multiplier cells is arranged in rows and columns. Each of the multiplier cells outputs a sum and a product and each cell includes five inputs. The 1st input receives a preceding cell's multiply output, the 2nd input receives at least one bit of the 2nd operand, the 3rd input receives a preceding cell's sum output, a 4th input receives at least one bit of the generating polynomial, and the 5th input receives a feedback term from a preceding cell in a preceding row. The multiplier cells in the 1st row have the 1st input, 3rd input, and 5th input set to corresponding initialization values in accordance with the 2nd operand.

摘要翻译： 伽罗瓦域倍增器阵列包括1 寄存器，第二寄存器，第三寄存器和多个乘法器单元。 1＆lt; ST＆gt;寄存器存储1＆lt; ST＆gt;操作数的位。 2 寄存器存储第2个操作数的位。 3 寄存器存储对应于多个应用中的一个应用（例如，FEC，CRC，Reed Solomon等）的生成多项式的比特。多个乘法器单元被排列成行和列。每个乘法器单元输出和和乘积，并且每个单元包括五个输入。 1 ^{输入接收前一个单元的乘法输出，第二个输入端接收第二个操作数的至少一位，3个< SUP> rd 输入接收前一个单元的和输出，第4个输入接收生成多项式的至少一个位，并且第5个输入接收一个来自前一行中的前一个单元格的反馈项。 1 ^{行中的乘法器单元具有1 ^{输入，3 输入和5 输入根据第2操作数设置为相应的初始化值。}}}

3.

发明授权
Processor having a finite field arithmetic unit utilizing an array of multipliers and adders 有权
标题翻译：处理器具有利用乘法器和加法器阵列的有限域运算单元

公开(公告)号：US07343472B2

公开(公告)日：2008-03-11

申请号：US10459907

申请日：2003-06-11

申请人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

发明人： Joshua Porten , Won Kim , Scott D. Johnson , John R. Nickolls

IPC分类号： G06F15/76 , G06F9/30 , G06F9/40 , G06F7/00 , G06F15/00 , G06F7/38 , G06F9/00 , G06F9/44 , H03M13/00

CPC分类号： G06F7/724 , G06F9/30018

摘要： A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction. When the finite field arithmetic unit is to perform the operational code, it performs a finite field arithmetic function upon data stored in the at least one source location in accordance with the operational code and provides the resultant to the destination location.

摘要翻译： 处理器包括指令存储器，算术逻辑单元，有限域算术单元，至少一个数字存储设备和指令解码器。指令存储器临时存储包括以下操作代码，目的地信息和源信息中的至少一个的指令。指令解码器可操作地耦合以解释用于识别算术逻辑单元和/或有限域运算单元的指令以执行相应指令的操作代码。然后，指令解码器基于包含在相应指令内的目的地信息来识别数字存储设备内的至少一个目的地位置。然后，指令解码器基于相应指令的源信息识别数字存储设备内的至少一个源位置。当有限域算术单元要执行操作代码时，它根据操作代码对存储在至少一个源位置的数据执行有限域算术功能，并将结果提供给目的地位置。

4.

发明授权
Reconfigurable processing system and method 有权
标题翻译：可重构的处理系统和方法

公开(公告)号：US06959378B2

公开(公告)日：2005-10-25

申请号：US10004246

申请日：2001-11-02

申请人： John R. Nickolls , Scott D. Johnson , Mark Williams , Ethan Mirsky , Kambdur Kirthiranjan , Amrit Raj Pant , Lawrence J. Madar, III

发明人： John R. Nickolls , Scott D. Johnson , Mark Williams , Ethan Mirsky , Kambdur Kirthiranjan , Amrit Raj Pant , Lawrence J. Madar, III

IPC分类号： G06F9/302 , G06F9/318 , G06F9/32 , G06F9/345 , G06F15/78 , G06F9/30 , G06F19/00

CPC分类号： G06F9/3001 , G06F9/30181 , G06F9/325 , G06F9/345 , G06F9/3455 , G06F15/8061

摘要： A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed. For each vector, vector address units provide addresses which stride through each element of each vector.

摘要翻译： 可重构处理系统并行执行指令和配置。最初，第一条指令将配置加载到配置寄存器中。随后取出的指令的配置字段选择配置寄存器。所选配置寄存器中的配置的指令控制和控制按照指令进行解码和修改。控件向处理操作数并生成结果的执行单元提供数据操作数。可以处理标量数据，向量数据或标量和向量数据的组合。处理由与指令中的配置字段调用的配置并行执行的指令控制。使用存储向量的向量寄存器文件处理向量。向量地址单元标识要处理的向量寄存器文件中的向量元素的地址。对于每个向量，向量地址单元提供跨越每个向量的每个元素的地址。

5.

发明授权
Coalescing memory barrier operations across multiple parallel threads 有权
标题翻译：在多个并行线程之间合并记忆障碍操作

公开(公告)号：US09223578B2

公开(公告)日：2015-12-29

申请号：US12887081

申请日：2010-09-21

申请人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow

发明人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow

IPC分类号： G06F9/46 , G06F9/38 , G06F9/30

CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

摘要翻译： 本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。此外，存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。例如，第一类型的存储器障碍指令可以将存储器事务提交到共享L1（一级）高速缓存的一组协作线程的级别。第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。最后，第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

6.

发明申请
THREAD GROUP SCHEDULER FOR COMPUTING ON A PARALLEL THREAD PROCESSOR 有权
标题翻译：用于并行螺纹加工器的螺纹组合调度器

公开(公告)号：US20120110586A1

公开(公告)日：2012-05-03

申请号：US13247819

申请日：2011-09-28

申请人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

发明人： Brett W. Coon , John R. Nickolls , John Erik Lindholm , Robert J. Stoll , Nicholas Wang , Jack Hilaire Choquette , Kathleen Elliott Nickolls

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F2209/483

摘要： A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

摘要翻译： 并行线程处理器执行属于多个协作线程数组（CTA）的线程组。在并行线程处理器的每个周期，指令调度器在随后的周期中选择要发行的线程组以执行。指令调度器通过（i）识别可用线程组的池，（ii）识别具有最大资历值的CTA来选择要执行的线程组，以及（iii）选择具有最大信用值的线程组从具有最高资历价值的CTA内。

7.

发明授权
Generating event signals for performance register control using non-operative instructions 有权
标题翻译：使用非操作指令生成用于性能寄存器控制的事件信号

公开(公告)号：US07809928B1

公开(公告)日：2010-10-05

申请号：US11313872

申请日：2005-12-20

申请人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls

发明人： Roger L. Allen , Brett W. Coon , Ian A. Buck , John R. Nickolls

IPC分类号： G06F9/30 , G06F17/00 , G09G5/02

CPC分类号： G06T1/20 , G06F9/30072 , G06F9/30076 , G06F11/3466 , G06F2201/86 , G06F2201/865 , G06F2201/88

摘要： One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.

摘要翻译： 指令解码器的一个实施例包括：指令解析器，被配置为处理第一非操作指令并产生对应于第一非操作指令的第一事件信号;以及第一事件多路复用器，被配置为从指令接收第一事件信号解析器，以从一个或多个事件信号中选择第一事件信号，并将第一事件信号发送到事件逻辑块。指令解码器可以在诸如着色器单元的多线程处理单元中实现，并且当在处理单元内执行一个或多个线程时，可以跟踪第一事件信号的出现。所得到的事件信号计数可以使设计者更好地理解在处理单元内执行的诸如着色器程序之类的程序的行为，从而有助于整体处理单元和程序设计。

8.

发明授权
Bit reversal methods for a parallel processor 有权
标题翻译：并行处理器的位反转方法

公开(公告)号：US07640284B1

公开(公告)日：2009-12-29

申请号：US11424514

申请日：2006-06-15

申请人： Nolan D. Goodnight , John R. Nickolls

发明人： Nolan D. Goodnight , John R. Nickolls

IPC分类号： G06F17/14

CPC分类号： G06F17/142 , G06F7/76

摘要： Parallelism in a processor is exploited to permute a data set based on bit reversal of indices associated with data points in the data set. Permuted data can be stored in a memory having entries arranged in banks, where entries in different banks can be accessed in parallel. A destination location in the memory for a particular data point from the data set is determined based on the bit-reversed index associated with that data point. The bit-reversed index can be further modified so that at least some of the destination locations determined by different parallel processes are in different banks, allowing multiple points of the bit-reversed data set to be written in parallel.

摘要翻译： 处理器中的并行性被利用以基于与数据集中的数据点相关联的索引的位反转来置换数据集。被许可的数据可以存储在具有排列在存储体中的条目的存储器中，其中可以并行地访问不同存储体中的条目。基于与该数据点相关联的位反转索引来确定来自数据集的用于特定数据点的存储器中的目的地位置。可以进一步修改位反转索引，使得由不同并行进程确定的至少一些目的地位置在不同的存储体中，允许并行写入位反转数据集的多个点。

9.

发明申请
SYSTEMS AND METHODS FOR COALESCING MEMORY ACCESSES OF PARALLEL THREADS 有权
标题翻译：用于并行线程的存储器访问的系统和方法

公开(公告)号：US20090240895A1

公开(公告)日：2009-09-24

申请号：US12054330

申请日：2008-03-24

申请人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal

发明人： Lars Nyland , John R. Nickolls , Gentaro Hirota , Tanmoy Mandal

IPC分类号： G06F12/00

CPC分类号： G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F9/3891

摘要： One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data. When all the memory access requests associated with a particular PRT entry are complete, the core interface satisfies the corresponding application request and frees the PRT entry.

摘要翻译： 本发明的一个实施例提出了一种用于有效且灵活地执行线程组合的存储器访问的技术。对于为线程组服务的每个读取应用程序请求，核心接口生成一个未决请求表（PRT）条目和一个或多个内存访问请求。核心接口基于应用程序请求中的存储器访问地址的扩展来确定存储器访问请求的数量和每个存储器访问请求的大小。每个存储器访问请求指定存储器访问请求服务的特定线程。 PRT条目跟踪挂起的内存访问请求的数量。当存储器接口完成每个存储器访问请求时，核心接口使用存储器访问请求中的信息和对应的PRT条目来路由返回的数据。当与特定PRT条目相关联的所有存储器访问请求完成时，核心接口满足相应的应用请求并释放PRT条目。

10.

发明授权
Parallel data processing systems and methods using cooperative thread arrays and SIMD instruction issue 有权
标题翻译：并行数据处理系统和方法使用协作线程数组和SIMD指令问题

公开(公告)号：US07584342B1

公开(公告)日：2009-09-01

申请号：US11305479

申请日：2005-12-15

申请人： Bryon S. Nordquist , John R. Nickolls , Luis I. Bacayo

发明人： Bryon S. Nordquist , John R. Nickolls , Luis I. Bacayo

IPC分类号： G06F15/80

CPC分类号： G06F9/52 , G06F9/3851 , G06F9/3887

摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time and that controls various aspects of the thread's processing behavior, such as the portion of the input data set to be processed by each thread, the portion of the output data set to be produced by each thread, and/or sharing of intermediate results among threads. Where groups of threads are executed in SIMD parallelism, thread IDs for threads in the same SIMD group are generated and assigned in parallel, allowing different SIMD groups to be launched in rapid succession.

摘要翻译： 并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配，并控制线程处理行为的各个方面，例如每个线程处理的输入数据集的部分，部分由每个线程产生的输出数据集合和/或线程之间的中间结果的共享。在SIMD并行执行线程组的情况下，并行生成并分配同一SIMD组中的线程的线程ID，从而允许快速连续启动不同的SIMD组。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类