Compute work distribution reference counters
    1.
    发明授权
    Compute work distribution reference counters 有权
    计算工作分配参考计数器

    公开(公告)号:US09507638B2

    公开(公告)日:2016-11-29

    申请号:US13291369

    申请日:2011-11-08

    IPC分类号: G06F9/455 G06F9/50

    CPC分类号: G06F9/5022

    摘要: One embodiment of the present invention sets forth a technique for managing the allocation and release of resources during multi-threaded program execution. Programmable reference counters are initialized to values that limit the amount of resources for allocation to tasks that share the same reference counter. Resource parameters are specified for each task to define the amount of resources allocated for consumption by each array of execution threads that is launched to execute the task. The resource parameters also specify the behavior of the array for acquiring and releasing resources. Finally, during execution of each thread in the array, an exit instruction may be configured to override the release of the resources that were allocated to the array. The resources may then be retained for use by a child task that is generated during execution of a thread.

    摘要翻译: 本发明的一个实施例提出了一种用于在多线程程序执行期间管理资源的分配和释放的技术。 可编程参考计数器被初始化为限制用于分配给共享相同引用计数器的任务的资源量的值。 为每个任务指定资源参数,以定义为执行任务启动的每个执行线程数组分配的消耗资源量。 资源参数还指定数组用于获取和释放资源的行为。 最后,在执行阵列中的每个线程时,可以将退出指令配置为覆盖分配给阵列的资源的释放。 然后可以保留资源以供执行线程期间生成的子任务使用。

    Low latency concurrent computation
    2.
    发明授权
    Low latency concurrent computation 有权
    低延迟并发计算

    公开(公告)号:US08928677B2

    公开(公告)日:2015-01-06

    申请号:US13357569

    申请日:2012-01-24

    IPC分类号: G06F15/80 G06T1/20

    CPC分类号: G06F9/5027 G06F2209/509

    摘要: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

    摘要翻译: 本发明的一个实施例提出了一种用于在并行处理子系统上执行低等待时间计算的技术。 低延迟功能节点暴露于操作系统。 低延迟功能节点和通用功能节点被配置为针对并行处理子系统内的相同底层处理器资源。 操作系统存储由与低延迟功能节点相关联的低延迟命令缓冲器内由用户应用产生的低延迟任务。 并行处理子系统在完成在通用命令缓冲器中执行任务之前有利地执行来自低等待时间命令缓冲器的任务,从而减少低延迟任务的完成等待时间。

    LOW LATENCY CONCURRENT COMPUTATION
    5.
    发明申请
    LOW LATENCY CONCURRENT COMPUTATION 有权
    低期并发计算

    公开(公告)号:US20130187935A1

    公开(公告)日:2013-07-25

    申请号:US13357569

    申请日:2012-01-24

    IPC分类号: G06F15/80

    CPC分类号: G06F9/5027 G06F2209/509

    摘要: One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks.

    摘要翻译: 本发明的一个实施例提出了一种用于在并行处理子系统上执行低等待时间计算的技术。 低延迟功能节点暴露于操作系统。 低延迟功能节点和通用功能节点被配置为针对并行处理子系统内的相同底层处理器资源。 操作系统存储由与低延迟功能节点相关联的低延迟命令缓冲器内由用户应用产生的低延迟任务。 并行处理子系统有利地在完成在通用命令缓冲器中执行任务之前从低延迟命令缓冲器执行任务,从而减少低延迟任务的完成等待时间。

    Software-Assisted Instruction Level Execution Preemption
    6.
    发明申请
    Software-Assisted Instruction Level Execution Preemption 有权
    软件辅助指令级别执行抢占

    公开(公告)号:US20130117760A1

    公开(公告)日:2013-05-09

    申请号:US13291476

    申请日:2011-11-08

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention sets forth a technique for instruction level execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. Any in-flight instructions that follow the preemption command in the processing pipeline are captured and stored in a processing task buffer to be reissued when the preempted program is resumed. The processing task buffer is designated as a high priority task to ensure the preempted instructions are reissued before any new instructions for the preempted context when execution of the preempted context is restored.

    摘要翻译: 本发明的一个实施例提出了一种用于指令级执行抢占的技术。 在指令级别抢占不需要处理管道的任何排水。 不会发出新的指令,并且从处理流水线中卸载上下文状态。 在处理流水线中遵循抢占命令的任何飞行中指令被捕获并存储在处理任务缓冲器中,以便在重新启动抢占程序时重新发行。 处理任务缓冲器被指定为高优先级任务,以确保在执行抢占上下文的任何新的指令针对抢占上下文的任何新指令被重新发行之前被重新发行。

    TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM
    7.
    发明申请
    TECHNIQUE FOR COMPUTATIONAL NESTED PARALLELISM 有权
    计算并行平行技术

    公开(公告)号:US20130298133A1

    公开(公告)日:2013-11-07

    申请号:US13462649

    申请日:2012-05-02

    IPC分类号: G06F9/50

    摘要: One embodiment of the present invention sets forth a technique for performing nested kernel execution within a parallel processing subsystem. The technique involves enabling a parent thread to launch a nested child grid on the parallel processing subsystem, and enabling the parent thread to perform a thread synchronization barrier on the child grid for proper execution semantics between the parent thread and the child grid. This technique advantageously enables the parallel processing subsystem to perform a richer set of programming constructs, such as conditionally executed and nested operations and externally defined library functions without the additional complexity of CPU involvement.

    摘要翻译: 本发明的一个实施例提出了一种用于在并行处理子系统内执行嵌套的内核执行的技术。 该技术涉及使父线程启动并行处理子系统上的嵌套子网格,并使父线程能够在子网格上执行线程同步屏障,以在父线程和子网格之间实现正确的执行语义。 该技术有利地使得并行处理子系统能够执行更丰富的编程结构集合,诸如条件执行和嵌套操作以及外部定义的库函数,而不会增加CPU参与的复杂性。

    AUTOMATIC DEPENDENT TASK LAUNCH
    8.
    发明申请
    AUTOMATIC DEPENDENT TASK LAUNCH 审中-公开
    自动相关任务启动

    公开(公告)号:US20130198760A1

    公开(公告)日:2013-08-01

    申请号:US13360581

    申请日:2012-01-27

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention sets forth a technique for automatic launching of a dependent task when execution of a first task completes. Automatically launching the dependent task reduces the latency incurred during the transition from the first task to the dependent task. Information associated with the dependent task is encoded as part of the metadata for the first task. When execution of the first task completes a task scheduling unit is notified and the dependent task is launched without requiring any release or acquisition of a semaphore. The information associated with the dependent task includes an enable flag and a pointer to the dependent task. Once the dependent task is launched, the first task is marked as complete so that memory storing the metadata for the first task may be reused to store metadata for a new task.

    摘要翻译: 本发明的一个实施例提出了当执行第一任务完成时自动启动依赖任务的技术。 自动启动从属任务可以减少在从第一个任务到从属任务的转换过程中产生的延迟。 与依赖任务相关联的信息被编码为第一任务的元数据的一部分。 当执行第一任务完成任务调度单元被通知并且从属任务被启动而不需要任何释放或获取信号量时。 与从属任务相关联的信息包括使能标志和指向依赖任务的指针。 一旦启动依赖任务,第一个任务被标记为完整的,以便存储第一个任务的元数据的内存可以被重新用于存储新任务的元数据。