专利检索 ap:("Won S. Kim" OR "David M. Bulfer" OR "John R. Nickolls" OR "W. Thomas Blank" OR "Hannes Figel") AND inv:"John R. Nickolls" 第 8 页

71.

发明申请
SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS 有权
标题翻译：具有多个并行请求管理的共享单个访问记忆

公开(公告)号：US20110252204A1

公开(公告)日：2011-10-13

申请号：US13165638

申请日：2011-06-21

申请人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

发明人： Brett W. Coon , Ming Y. Siu , Weizhong Xu , Stuart F. Oberman , John R. Nickolls , Peter C. Mills

IPC分类号： G06F12/08

CPC分类号： G06F12/084 , Y02D10/13

摘要： A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

摘要翻译： 多线程处理器中的并发线程使用内存。任何可寻址的存储位置都可以由任何并发线程访问，但一次只能访问一个位置。存储器耦合到并行处理引擎，其产生一组并行存储器访问请求，每个指定对于不同请求可能相同或不同的目标地址。序列化逻辑选择一个目标地址，并确定哪个请求指定所选择的目标地址。允许所有这些请求并行进行，而其他请求被推迟。可以通过序列化逻辑重新生成和处理延迟请求，以便通过一次访问组中的每个不同的目标地址来满足一组请求。

72.

发明申请
METHODS FOR SCALABLY EXPLOITING PARALLELISM IN A PARALLEL PROCESSING SYSTEM 有权
标题翻译：在平行处理系统中大量开发并行的方法

公开(公告)号：US20110238955A1

公开(公告)日：2011-09-29

申请号：US13099035

申请日：2011-05-02

申请人： John R. Nickolls , Stephen D. Lew

发明人： John R. Nickolls , Stephen D. Lew

IPC分类号： G06F9/30

CPC分类号： G06F9/3851 , G06F9/30072 , G06F9/3012 , G06F9/3889 , G06F9/5066

摘要： Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.

摘要翻译： 并行处理子系统中的并行性以可扩展的方式被利用。要解决的问题可以被分层分解成至少两个级别的子问题。定义程序执行的各个线程来解决最低级别的问题。线程被分组成一个或多个线程数组，每个线程数组都解决了较高级的子问题。线程数组可以通过处理内核执行，每个核心可以一次执行至少一个线程数组。线程数组可以分组成独立线程数组的网格，从而解决更高级的子问题或整个问题。网格中的线程数组或整个网格可以分布在所有可用处理核心中，如特定系统实现中可用的。

73.

发明申请
Sharing Data Crossbar for Reads and Writes in a Data Cache 有权
标题翻译：共享数据交叉开关用于在数据缓存中进行读写

公开(公告)号：US20110082961A1

公开(公告)日：2011-04-07

申请号：US12892862

申请日：2010-09-28

申请人： Alexander L. Minkin , Steven L. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

发明人： Alexander L. Minkin , Steven L. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

IPC分类号： G06F13/36 , G06F13/00

CPC分类号： G06F13/4022 , G06F13/4031

摘要： The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

摘要翻译： 本发明提出了一种L1缓存架构，其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。类似地，与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。为了允许在交叉开关单元上传输读取和写入数据，仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

74.

发明申请
Unified Addressing and Instructions for Accessing Parallel Memory Spaces 有权
标题翻译：统一寻址和访问并行内存空间的说明

公开(公告)号：US20110078406A1

公开(公告)日：2011-03-31

申请号：US12567637

申请日：2009-09-25

申请人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

发明人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

IPC分类号： G06F12/10

CPC分类号： G06F12/1054 , G06F12/0284 , G06F12/109 , G06F13/404 , G06F2212/302 , G06F2212/656

摘要： One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

摘要翻译： 本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。可以使用单一类型的加载或存储指令，其指定线程的统一存储器空间地址，而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

75.

发明申请
UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR 有权

公开(公告)号：US20110072249A1

公开(公告)日：2011-03-24

申请号：US12815226

申请日：2010-06-14

申请人： John R. Nickolls , Richard Craig Johnson , Robert Steven Glanville , Guillermo Juan Rozas

发明人： John R. Nickolls , Richard Craig Johnson , Robert Steven Glanville , Guillermo Juan Rozas

IPC分类号： G06F9/38

CPC分类号： G06F9/30072 , G06F9/3851 , G06F9/3887

摘要： One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.

76.

发明授权
Register based queuing for texture requests 有权
标题翻译：基于注册排队的纹理请求

公开(公告)号：US07864185B1

公开(公告)日：2011-01-04

申请号：US12256848

申请日：2008-10-23

申请人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

发明人： John Erik Lindholm , John R. Nickolls , Simon S. Moy , Brett W. Coon

IPC分类号： G06T11/40 , G06T15/00 , G06T15/20 , G06T1/00

CPC分类号： G06T11/60 , G09G5/363

摘要： A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

摘要翻译： 图形处理单元可以排队大量纹理请求，以平衡纹理请求的可变性，而不需要大的纹理请求缓冲区。专用纹理请求缓冲区排队相对较小的纹理命令和参数。另外，对于每个排队的纹理命令，通常比纹理命令大得多的一组相关的纹理参数存储在通用寄存器中。纹理单元从纹理请求缓冲区中检索纹理命令，然后从相应的通用寄存器获取相关的纹理参数。纹理参数可以存储在指定为由纹理单元计算的最终纹理值的目的地的通用寄存器中。因为当纹理命令排队时，必须为目标寄存器分配最终纹理值，所以将纹理参数存储在该寄存器中不消耗任何其他寄存器。

77.

发明授权
Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior 有权
标题翻译：并行数据处理系统和方法使用协作线程数组和线程标识符值来确定处理行为

公开(公告)号：US07861060B1

公开(公告)日：2010-12-28

申请号：US11305178

申请日：2005-12-15

申请人： John R. Nickolls , Stephen D. Lew

发明人： John R. Nickolls , Stephen D. Lew

IPC分类号： G06F15/16

CPC分类号： G06F9/544 , G06F9/3851 , G06F9/3887 , G06F9/522

摘要： Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.

摘要翻译： 并行数据处理系统和方法使用协同线程数组（CIA），即在输入数据集上同时执行相同程序的多线程组，以产生输出数据集。 CTA中的每个线程都有一个唯一的标识符（线程ID），可以在线程启动时分配。线程ID控制线程的处理行为的各个方面，例如由每个线程处理的输入数据集的部分，由每个线程产生的输出数据集的部分和/或线程之间的中间结果的共享。还描述了在代表性处理核心中加载和启动CTA并在CTA内同步线程的机制。

78.

发明授权
Fast fourier transforms and related transforms using cooperative thread arrays 有权
标题翻译：快速傅里叶变换和相关变换使用协作线程数组

公开(公告)号：US07836116B1

公开(公告)日：2010-11-16

申请号：US11424511

申请日：2006-06-15

申请人： Nolan D. Goodnight , John R. Nickolls , Radoslav Danilak

发明人： Nolan D. Goodnight , John R. Nickolls , Radoslav Danilak

IPC分类号： G06F17/14

CPC分类号： G06F17/142 , G06F9/3012 , G06F9/3851 , G06F9/3885 , G06F9/3887

摘要： A linear transform such as a Fast Fourier Transform (FFT) is performed on an input data set having a number of points using one or more arrays of concurrent threads that are capable of sharing data with each other. Each thread of one thread array reads two or more of the points, performs an appropriate “butterfly” calculation to generate two or more new points, then stores the new points in a memory location that is accessible to other threads of the array. Each thread determines which points it is to read based at least in part on a unique thread identifier assigned thereto. Multiple transform stages can be handled by a single thread array, or different levels can be handled by different thread arrays.

摘要翻译： 对具有能够彼此共享数据的一个或多个并行线程阵列的具有多个点的输入数据集执行诸如快速傅立叶变换（FFT）的线性变换。一个线程数组的每个线程读取两个或更多个点，执行适当的“蝴蝶”计算以生成两个或多个新点，然后将新点存储在阵列的其他线程可访问的存储器位置。至少部分地基于分配给它的唯一线程标识符，每个线程确定要读取哪些点。多个变换阶段可以由单个线程数组处理，也可以由不同的线程数组来处理不同的级别。

79.

发明授权
Apparatus and method for debugging a graphics processing unit in response to a debug instruction 有权
标题翻译：响应于调试指令调试图形处理单元的装置和方法

公开(公告)号：US07711990B1

公开(公告)日：2010-05-04

申请号：US11302952

申请日：2005-12-13

申请人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller

发明人： John R. Nickolls , Roger L. Allen , Brian K. Cabral , Brett W. Coon , Robert C. Keller

IPC分类号： G06F11/00

CPC分类号： G06F11/3648

摘要： A system includes a graphics processing unit with a processor responsive to a debug instruction that initiates the storage of execution state information. A memory stores the execution state information. A central processing unit executes a debugging program to analyze the execution state information.

摘要翻译： 系统包括具有处理器的图形处理单元，该处理器响应于启动执行状态信息的存储的调试指令。存储器存储执行状态信息。中央处理单元执行调试程序以分析执行状态信息。

80.

发明授权
Register file allocation 有权
标题翻译：注册文件分配

公开(公告)号：US07634621B1

公开(公告)日：2009-12-15

申请号：US11556677

申请日：2006-11-03

申请人： Brett W. Coon , John Erik Lindholm , Gary Tarolli , Svetoslav D. Tzvetkov , John R. Nickolls , Ming Y. Siu

发明人： Brett W. Coon , John Erik Lindholm , Gary Tarolli , Svetoslav D. Tzvetkov , John R. Nickolls , Ming Y. Siu

IPC分类号： G06F12/00

CPC分类号： G06F9/3012 , G06F9/30123 , G06F9/3824 , G06F9/3851 , G06F9/3885 , G06F12/0223 , Y02D10/13

摘要： Circuits, methods, and apparatus that provide the die area and power savings of a single-ported memory with the performance advantages of a multiported memory. One example provides register allocation methods for storing data in a multiple-bank register file. In a thin register allocation method, data for a process is stored in a single bank. In this way, different processes use different banks to avoid conflicts. In a fat register allocation method, processes store data in each bank. In this way, if one process uses a large number of registers, those registers are spread among the banks, avoiding a situation where one bank is filled and other processes are forced to share a reduced number of banks. In a hybrid register allocation method, processes store data in more than one bank, but fewer than all the banks. Each of these methods may be combined in varying ways.

摘要翻译： 提供具有多端口存储器性能优势的单端口存储器的管芯面积和功率节省的电路，方法和装置。一个示例提供用于将数据存储在多存储器寄存器文件中的寄存器分配方法。在一个薄的寄存器分配方法中，一个进程的数据被存储在一个单独的存储单元中。以这种方式，不同的流程使用不同的银行来避免冲突。在胖寄存器分配方法中，处理将数据存储在每个存储区中。这样一来，如果一个进程使用大量的寄存器，这些寄存器就会在银行之间传播，避免了一个银行被填满的情况，而其他进程被迫分担一个数量减少的银行。在混合寄存器分配方法中，处理将数据存储在多个银行中，但少于所有银行。这些方法中的每一种可以以不同的方式组合。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类