专利检索 ap:("Timothy John Purcell" OR "Lacky V. Shah" OR "Jerome F. Duluk, Jr." OR "Sean J. Treichler" OR "Karim M. Abdalla" OR "Philip Alexander Cuadra" OR "Brian Pharris") AND inv:"Philip Alexander Cuadra" 第 1 页

1.

发明授权
Signaling, ordering, and execution of dynamically generated tasks in a processing system 有权
标题翻译：在处理系统中信令，排序和执行动态生成的任务

公开(公告)号：US08984183B2

公开(公告)日：2015-03-17

申请号：US13329169

申请日：2011-12-16

申请人： Timothy John Purcell , Lacky V. Shah , Jerome F. Duluk, Jr. , Sean J. Treichler , Karim M. Abdalla , Philip Alexander Cuadra , Brian Pharris

发明人： Timothy John Purcell , Lacky V. Shah , Jerome F. Duluk, Jr. , Sean J. Treichler , Karim M. Abdalla , Philip Alexander Cuadra , Brian Pharris

IPC分类号： G06F3/00 , G06F9/46 , G06F9/48

CPC分类号： G06F9/4843

摘要： One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.

摘要翻译： 本发明的一个实施例提出了一种用于使所生成的任务能够插入到多处理器系统的调度流水线中的技术，允许正在执行的计算任务动态生成动态任务并通知多处理器系统的调度单元没有CPU的干预。当将动态任务的数据写入队列时，响应于写入请求产生反映的通知信号。为在执行计算任务期间发生的其他事件生成附加的反映的通知信号，例如，使存储用于计算任务的数据的高速缓存条目无效并启用其他计算任务的调度。

2.

发明申请
SIGNALING, ORDERING, AND EXECUTION OF DYNAMICALLY GENERATED TASKS IN A PROCESSING SYSTEM 有权
标题翻译：在处理系统中的动态生成任务的信号，订购和执行

公开(公告)号：US20130160021A1

公开(公告)日：2013-06-20

申请号：US13329169

申请日：2011-12-16

申请人： Timothy John PURCELL , Lacky V. Shah , Jerome F. Duluk, JR. , Sean J. Treichler , Karim M. Abdalla , Philip Alexander Cuadra , Brian Pharris

发明人： Timothy John PURCELL , Lacky V. Shah , Jerome F. Duluk, JR. , Sean J. Treichler , Karim M. Abdalla , Philip Alexander Cuadra , Brian Pharris

IPC分类号： G06F9/46

CPC分类号： G06F9/4843

摘要： One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.

摘要翻译： 本发明的一个实施例提出了一种用于使所生成的任务能够插入到多处理器系统的调度流水线中的技术，允许正在执行的计算任务动态生成动态任务并通知多处理器系统的调度单元没有CPU的干预。当将动态任务的数据写入队列时，响应于写入请求产生反映的通知信号。为在执行计算任务期间发生的其他事件生成附加的反映的通知信号，例如，使存储用于计算任务的数据的高速缓存条目无效并启用其他计算任务的调度。

3.

发明授权
Compute work distribution reference counters 有权
标题翻译：计算工作分配参考计数器

公开(公告)号：US09507638B2

公开(公告)日：2016-11-29

申请号：US13291369

申请日：2011-11-08

申请人： Philip Alexander Cuadra , Karim M. Abdalla , Jerome F. Duluk, Jr. , Luke Durant , Gerald F. Luiz , Timothy John Purcell , Lacky V. Shah

发明人： Philip Alexander Cuadra , Karim M. Abdalla , Jerome F. Duluk, Jr. , Luke Durant , Gerald F. Luiz , Timothy John Purcell , Lacky V. Shah

IPC分类号： G06F9/455 , G06F9/50

CPC分类号： G06F9/5022

摘要： One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.

摘要翻译： 本发明的一个实施例提出了一种用于在多线程程序执行期间管理资源的分配和释放的技术。可编程参考计数器被初始化为限制用于分配给共享相同引用计数器的任务的资源量的值。为每个任务指定资源参数，以定义为执行任务启动的每个执行线程数组分配的消耗资源量。资源参数还指定数组用于获取和释放资源的行为。最后，在执行阵列中的每个线程时，可以将退出指令配置为覆盖分配给阵列的资源的释放。然后可以保留资源以供执行线程期间生成的子任务使用。

4.

发明申请
COMPUTE WORK DISTRIBUTION REFERENCE COUNTERS 有权
标题翻译：计算机工作分配参考计数器

公开(公告)号：US20130117758A1

公开(公告)日：2013-05-09

申请号：US13291369

申请日：2011-11-08

申请人： Philip Alexander Cuadra , Karim M. Abdalla , Jerome F. Duluk, JR. , Luke Durant , Gerald F. Luiz , Timothy John Purcell , Lacky V. Shah

发明人： Philip Alexander Cuadra , Karim M. Abdalla , Jerome F. Duluk, JR. , Luke Durant , Gerald F. Luiz , Timothy John Purcell , Lacky V. Shah

IPC分类号： G06F9/46

CPC分类号： G06F9/5022

摘要： One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.

摘要翻译： 本发明的一个实施例提出了一种用于在多线程程序执行期间管理资源的分配和释放的技术。可编程参考计数器被初始化为限制用于分配给共享相同引用计数器的任务的资源量的值。为每个任务指定资源参数，以定义为执行任务启动的每个执行线程数组分配给消耗的资源量。资源参数还指定数组用于获取和释放资源的行为。最后，在执行阵列中的每个线程时，可以将退出指令配置为覆盖分配给阵列的资源的释放。然后可以保留资源以供执行线程期间生成的子任务使用。

5.

发明申请
TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM 有权
标题翻译：计算并行平行技术

公开(公告)号：US20130298133A1

公开(公告)日：2013-11-07

申请号：US13462649

申请日：2012-05-02

申请人： Stephen JONES , Philip Alexander Cuadra , Daniel Elliot Wexler , Ignacio Llamas , Lacky V. Shah , Jerome F. Duluk, JR. , Christopher Lamb

发明人： Stephen JONES , Philip Alexander Cuadra , Daniel Elliot Wexler , Ignacio Llamas , Lacky V. Shah , Jerome F. Duluk, JR. , Christopher Lamb

IPC分类号： G06F9/50

CPC分类号： G06F9/5027 , G06F9/522 , G06F2209/483 , G06T1/20

摘要： One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

摘要翻译： 本发明的一个实施例提出了一种用于在并行处理子系统内执行嵌套的内核执行的技术。该技术涉及使父线程启动并行处理子系统上的嵌套子网格，并使父线程能够在子网格上执行线程同步屏障，以在父线程和子网格之间实现正确的执行语义。该技术有利地使得并行处理子系统能够执行更丰富的编程结构集合，诸如条件执行和嵌套操作以及外部定义的库函数，而不会增加CPU参与的复杂性。

6.

发明授权
Technique for computational nested parallelism 有权
标题翻译：计算嵌套并行性技术

公开(公告)号：US09513975B2

公开(公告)日：2016-12-06

申请号：US13462649

申请日：2012-05-02

申请人： Stephen Jones , Philip Alexander Cuadra , Daniel Elliot Wexler , Ignacio Llamas , Lacky V. Shah , Jerome F. Duluk, Jr. , Christopher Lamb

发明人： Stephen Jones , Philip Alexander Cuadra , Daniel Elliot Wexler , Ignacio Llamas , Lacky V. Shah , Jerome F. Duluk, Jr. , Christopher Lamb

IPC分类号： G06F9/46 , G06F9/52 , G06T1/20

CPC分类号： G06F9/5027 , G06F9/522 , G06F2209/483 , G06T1/20

摘要： One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

摘要翻译： 本发明的一个实施例提出了一种用于在并行处理子系统内执行嵌套的内核执行的技术。该技术涉及使父线程启动并行处理子系统上的嵌套子网格，并使父线程能够在子网格上执行线程同步屏障，以在父线程和子网格之间实现正确的执行语义。该技术有利地使得并行处理子系统能够执行更丰富的编程结构集合，诸如条件执行和嵌套操作以及外部定义的库函数，而不会增加CPU参与的复杂性。

7.

发明授权
Methods and apparatus for auto-throttling encapsulated compute tasks 有权

公开(公告)号：US09710306B2

公开(公告)日：2017-07-18

申请号：US13442730

申请日：2012-04-09

申请人： Jerome F. Duluk, Jr. , Jesse David Hall , Philip Alexander Cuadra , Karim M. Abdalla

发明人： Jerome F. Duluk, Jr. , Jesse David Hall , Philip Alexander Cuadra , Karim M. Abdalla

IPC分类号： G06F9/46 , G06F9/48 , G06F9/50

CPC分类号： G06F9/4881 , G06F9/5016 , G06F2209/483

摘要： Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

8.

发明申请
METHODS AND APPARATUS FOR AUTO-THROTTLING ENCAPSULATED COMPUTE TASKS 有权
标题翻译：用于自动曲轴加密计算机任务的方法和装置

公开(公告)号：US20130268942A1

公开(公告)日：2013-10-10

申请号：US13442730

申请日：2012-04-09

申请人： Jerome F. DULUK, JR. , Jesse David Hall , Philip Alexander Cuadra , Karim M. Abdalla

发明人： Jerome F. DULUK, JR. , Jesse David Hall , Philip Alexander Cuadra , Karim M. Abdalla

IPC分类号： G06F9/46

CPC分类号： G06F9/4881 , G06F9/5016 , G06F2209/483

摘要： Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

摘要翻译： 自动调节封装计算任务的系统和方法。设备驱动器可以配置并行处理器来执行多个离散节流模式中的计算任务。设备驱动器还可以以非节流模式将存储器分配给多个不同的处理单元。在每个节流模式中，设备驱动器还可以向多个处理单元的子集分配存储器。为每个任务定义的数据结构包括一个标志，指示处理单元是否可以在非节流模式或节流模式下执行任务。工作分配单元监视计划在多个处理单元上运行的任务，并且确定处理器是否应被配置为以节流模式或非节流模式运行。

9.

发明授权
Error checking in out-of-order task scheduling 有权

公开(公告)号：US09965321B2

公开(公告)日：2018-05-08

申请号：US13316344

申请日：2011-12-09

申请人： Jerome F. Duluk, Jr. , Timothy John Purcell , Jesse David Hall , Philip Alexander Cuadra

发明人： Jerome F. Duluk, Jr. , Timothy John Purcell , Jesse David Hall , Philip Alexander Cuadra

IPC分类号： G06F9/46 , G06F9/48

CPC分类号： G06F9/4843

摘要： One embodiment of the present invention sets forth a technique for error-checking a compute task. The technique involves receiving a pointer to a compute task, storing the pointer in a scheduling queue, determining that the compute task should be executed, retrieving the pointer from the scheduling queue, determining via an error-check procedure that the compute task is eligible for execution, and executing the compute task.

10.

发明申请
INSTRUCTION LEVEL EXECUTION PREEMPTION 审中-公开
标题翻译：指导级执行预防

公开(公告)号：US20130124838A1

公开(公告)日：2013-05-16

申请号：US13294045

申请日：2011-11-10

申请人： Lacky V. SHAH , Gregory Scott Palmer , Gernot Schaufler , Samuel H. Duncan , Philip Browning Johnson , Shirish Gadre , Robert Ohannessian , Nicholas Wang , Christopher Lamb , Philip Alexander Cuadra , Timothy John Purcell

发明人： Lacky V. SHAH , Gregory Scott Palmer , Gernot Schaufler , Samuel H. Duncan , Philip Browning Johnson , Shirish Gadre , Robert Ohannessian , Nicholas Wang , Christopher Lamb , Philip Alexander Cuadra , Timothy John Purcell

IPC分类号： G06F9/38

CPC分类号： G06F9/461

摘要： One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

摘要翻译： 本发明的一个实施例阐述了技术指令级别和计算线程数组粒度执行抢占。在指令级别抢占不需要处理管道的任何排水。不会发出新的指令，并且从处理流水线中卸载上下文状态。当在计算线程数组边界执行抢占时，由于处理流程内的执行单元完成飞行中指令的执行并变为空闲，因此减少了要存储的上下文状态量。如果完成执行飞行中指令所需的时间超过阈值，则抢占可以动态地改变以在指令级别而不是以计算线程数组粒度来执行。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类