专利检索 ap:("Steven James Heinrich" OR "Alexander L. Minkin" OR "Brett W. Coon" OR "Rajeshwaran Selvanesan" OR "Robert Steven Glanville" OR "Charles McCarver" OR "Anjana Rajendran" OR "Stewart Glenn Carlton" OR "John R. Nickolls" OR "Brian Fahs") AND inv:"John R. Nickolls" 第 1 页

1.

发明授权
Cache operations and policies for a multi-threaded client 有权

公开(公告)号：US09952977B2

公开(公告)日：2018-04-24

申请号：US12890476

申请日：2010-09-24

申请人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

发明人： Steven James Heinrich , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

IPC分类号： G06F12/00 , G06F12/0842 , G06F12/0897

CPC分类号： G06F12/0842 , G06F12/0897

摘要： A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

2.

发明申请
Cache Operations and Policies For A Multi-Threaded Client 有权
标题翻译：多线程客户端的缓存操作和策略

公开(公告)号：US20110078381A1

公开(公告)日：2011-03-31

申请号：US12890476

申请日：2010-09-24

申请人： Steven James HEINRICH , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

发明人： Steven James HEINRICH , Alexander L. Minkin , Brett W. Coon , Rajeshwaran Selvanesan , Robert Steven Glanville , Charles McCarver , Anjana Rajendran , Stewart Glenn Carlton , John R. Nickolls , Brian Fahs

IPC分类号： G06F12/08 , G06F12/00

CPC分类号： G06F12/0842 , G06F12/0897

摘要： A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

摘要翻译： 一种用于在处理单元中管理并行高速缓存层级的方法。该方法包括接收包括高速缓存操作修饰符的指令，该缓存操作修饰符标识其中要缓存与指令相关联的数据的并行高速缓存层级的级别; 并基于高速缓存操作修饰符实现高速缓存替换策略。

3.

发明授权
Sharing data crossbar for reads and writes in a data cache 有权
标题翻译：在数据高速缓存中共享用于读写数据的交叉开关

公开(公告)号：US09286256B2

公开(公告)日：2016-03-15

申请号：US12892862

申请日：2010-09-28

申请人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

发明人： Alexander L. Minkin , Steven J. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

IPC分类号： G06F12/00 , G06F13/40 , G06F13/00 , G06F13/28

CPC分类号： G06F13/4022 , G06F13/4031

摘要： The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

摘要翻译： 本发明提出了一种L1缓存架构，其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。类似地，与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。为了允许在交叉开关单元上传输读取和写入数据，仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

4.

发明申请
Sharing Data Crossbar for Reads and Writes in a Data Cache 有权
标题翻译：共享数据交叉开关用于在数据缓存中进行读写

公开(公告)号：US20110082961A1

公开(公告)日：2011-04-07

申请号：US12892862

申请日：2010-09-28

申请人： Alexander L. Minkin , Steven L. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

发明人： Alexander L. Minkin , Steven L. Heinrich , Rajeshwaran Selvanesan , Stewart Glenn Carlton , John R. Nickolls

IPC分类号： G06F13/36 , G06F13/00

CPC分类号： G06F13/4022 , G06F13/4031

摘要： The invention sets forth an L1 cache architecture that includes a crossbar unit configured to transmit data associated with both read data requests and write data requests. Data associated with read data requests is retrieved from a cache memory and transmitted to the client subsystems. Similarly, data associated with write data requests is transmitted from the client subsystems to the cache memory. To allow for the transmission of both read and write data on the crossbar unit, an arbiter is configured to schedule the crossbar unit transmissions as well and arbitrate between data requests received from the client subsystems.

摘要翻译： 本发明提出了一种L1缓存架构，其包括被配置为发送与读取数据请求和写入数据请求相关联的数据的交叉单元。与读取数据请求相关联的数据从高速缓冲存储器检索并发送到客户机子系统。类似地，与写数据请求相关联的数据从客户端子系统发送到高速缓冲存储器。为了允许在交叉开关单元上传输读取和写入数据，仲裁器被配置为调度交叉单元传输以及在从客户端子系统接收的数据请求之间进行仲裁。

5.

发明授权
Unified addressing and instructions for accessing parallel memory spaces 有权
标题翻译：统一寻址和访问并行存储空间的指令

公开(公告)号：US08271763B2

公开(公告)日：2012-09-18

申请号：US12567637

申请日：2009-09-25

申请人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

发明人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

IPC分类号： G06F12/10

CPC分类号： G06F12/1054 , G06F12/0284 , G06F12/109 , G06F13/404 , G06F2212/302 , G06F2212/656

摘要： One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

摘要翻译： 本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。可以使用单一类型的加载或存储指令，其指定线程的统一存储器空间地址，而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

6.

发明申请
Unified Addressing and Instructions for Accessing Parallel Memory Spaces 有权
标题翻译：统一寻址和访问并行内存空间的说明

公开(公告)号：US20110078406A1

公开(公告)日：2011-03-31

申请号：US12567637

申请日：2009-09-25

申请人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

发明人： John R. Nickolls , Brett W. Coon , Ian A. Buck , Robert Steven Glanville

IPC分类号： G06F12/10

CPC分类号： G06F12/1054 , G06F12/0284 , G06F12/109 , G06F13/404 , G06F2212/302 , G06F2212/656

摘要： One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

摘要翻译： 本发明的一个实施例提出了一种用于将多个不同的并行存储器空间的寻址统一为用于线程的单个地址空间的技术。统一的存储空间地址被转换为访问该线程的并行存储器空间之一的地址。可以使用单一类型的加载或存储指令，其指定线程的统一存储器空间地址，而不是使用不同类型的加载或存储指令来访问每个不同的并行存储器空间。

7.

发明授权
Cooperative thread array reduction and scan operations 有权
标题翻译：合作线程数组减少和扫描操作

公开(公告)号：US08539204B2

公开(公告)日：2013-09-17

申请号：US12890227

申请日：2010-09-24

申请人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

发明人： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

IPC分类号： G06F9/30 , G06F9/40 , G06F15/00

CPC分类号： G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

摘要翻译： 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

8.

发明申请
EFFICIENT IMPLEMENTATION OF ARRAYS OF STRUCTURES ON SIMT AND SIMD ARCHITECTURES 有权
标题翻译：对SIMT和SIMD建筑结构的有效实施

公开(公告)号：US20120089792A1

公开(公告)日：2012-04-12

申请号：US13247855

申请日：2011-09-28

申请人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon

发明人： Brian FAHS , John R. Nickolls , Kathleen Elliott Nickolls , Henry Packard Moreton , Brett W. Coon

IPC分类号： G06F12/00

CPC分类号： G06F9/3885 , G06F9/30036 , G06F9/3009 , G06F9/30123 , G06F9/345 , G06F9/3824 , G06F9/3851 , G06F9/3887 , G06F12/0207 , G06T1/20

摘要： One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

摘要翻译： 本发明的一个实施例提出了一种技术，其提供了一种在多个线程/数据通道上分配和访问存储器的优化方式。具体来说，设备驱动程序接收到作为阵列结构的阵列设置的存储器的指令。设备驱动程序使用关于指令本身的线程/数据通道数和参数的信息来计算存储器中的地址。结果是存储器分配和访问方法，其中设备驱动器正确地计算存储器中的目标地址。有利的是，处理效率得到改善，其中并行处理子系统中的存储器被内部存储和访问为与SIMT / SIMD组宽度（每个执行组的线程或通道数）成比例的阵列结构的阵列。

9.

发明申请
COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
标题翻译：合作螺线减排和扫描作业

公开(公告)号：US20110078417A1

公开(公告)日：2011-03-31

申请号：US12890227

申请日：2010-09-24

申请人： Brian FAHS , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

发明人： Brian FAHS , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

IPC分类号： G06F9/38

CPC分类号： G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

摘要翻译： 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。聚合被指定为屏障同步或屏障到达指令的一部分，其中除了执行屏障同步或到达之外，指令聚合（使用缩减或扫描操作）由每个线程提供的值。当线程执行屏障聚合指令时，线程有助于扫描或缩小结果，并等待执行任何更多指令，直到所有线程都执行了阻挡聚合指令为止。在所有线程执行了屏障聚合指令之后，向每个线程传送减少结果，并且当线程执行屏障聚合指令时，将扫描结果传送给每个线程。

10.

发明授权
Coalescing memory barrier operations across multiple parallel threads 有权
标题翻译：在多个并行线程之间合并记忆障碍操作

公开(公告)号：US09223578B2

公开(公告)日：2015-12-29

申请号：US12887081

申请日：2010-09-21

申请人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow

发明人： John R. Nickolls , Steven James Heinrich , Brett W. Coon , Michael C. Shebanow

IPC分类号： G06F9/46 , G06F9/38 , G06F9/30

CPC分类号： G06F9/3834 , G06F9/3004 , G06F9/30087 , G06F9/3851

摘要： One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

摘要翻译： 本发明的一个实施例提出了一种用于在多个并行线程之间聚合存储器屏障操作的技术。来自给定并行线程处理单元的存储器屏障请求被合并以减少对系统其余部分的影响。此外，存储器屏障请求可以指定针对其提交内存事务的一组线程的级别。例如，第一类型的存储器障碍指令可以将存储器事务提交到共享L1（一级）高速缓存的一组协作线程的级别。第二种类型的存储器障碍指令可以将存储器事务提交到共享全局存储器的一组线程的级别。最后，第三种类型的存储器障碍指令可以将存储器事务提交到共享所有系统存储器的所有线程的系统级。执行存储器屏障指令所需的延迟基于存储器屏障指令的类型而变化。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类