APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS
    2.
    发明申请
    APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS 审中-公开
    有效的算术运算方法

    公开(公告)号:US20140129807A1

    公开(公告)日:2014-05-08

    申请号:US13671485

    申请日:2012-11-07

    Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.

    Abstract translation: 描述了一种系统和方法,用于向处理单元提供后续操作可能的提示。 响应地,处理单元采取步骤准备可能的后续操作。 在提示更有可能不正确的地方,处理单元更有效地运作。 例如,在一个实施例中,处理单元消耗较少的功率。 在另一个实施例中,由于处理单元被准备好以有效地处理随后的操作,更快地执行后续操作。

    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS
    3.
    发明申请
    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
    合作螺线减排和扫描作业

    公开(公告)号:US20160357560A1

    公开(公告)日:2016-12-08

    申请号:US15238428

    申请日:2016-08-16

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    Abstract translation: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。 聚合被指定为屏障同步或屏障到达指令的一部分,其中除了执行屏障同步或到达之外,指令聚合(使用缩减或扫描操作)由每个线程提供的值。 当线程执行屏障聚合指令时,线程有助于扫描或缩小结果,并等待执行任何更多指令,直到所有线程都执行了阻挡聚合指令为止。 在所有线程执行了屏障聚合指令之后,向每个线程传递减少结果,并且当线程执行屏障聚合指令时,将扫描结果传送给每个线程。

    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS
    4.
    发明申请
    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
    可编程图形处理程序,用于多方案执行程序

    公开(公告)号:US20160300319A9

    公开(公告)日:2016-10-13

    申请号:US13850175

    申请日:2013-03-25

    CPC classification number: G06T1/20 G06F9/38 G06F9/3851

    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    Abstract translation: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。

    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS
    5.
    发明申请
    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
    合作螺线减排和扫描作业

    公开(公告)号:US20140019724A1

    公开(公告)日:2014-01-16

    申请号:US14025482

    申请日:2013-09-12

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    Abstract translation: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。 聚合被指定为屏障同步或屏障到达指令的一部分,其中除了执行屏障同步或到达之外,指令聚合(使用缩减或扫描操作)由每个线程提供的值。 当线程执行屏障聚合指令时,线程有助于扫描或缩小结果,并等待执行任何更多指令,直到所有线程都执行了阻挡聚合指令为止。 在所有线程执行了屏障聚合指令之后,向每个线程传送减少结果,并且当线程执行屏障聚合指令时,将扫描结果传送给每个线程。

    EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE
    8.
    发明申请
    EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE 审中-公开
    通过分布式指令集架构实现高效

    公开(公告)号:US20150113254A1

    公开(公告)日:2015-04-23

    申请号:US14061666

    申请日:2013-10-23

    CPC classification number: G06F9/3836

    Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.

    Abstract translation: 子系统被配置为支持具有主和辅助执行管线的分布式指令集体系结构。 主要执行流水线支持经常发布的分布式指令集架构中的指令子集的执行。 辅助执行流水线支持执行分布式指令集体系结构中不太频繁发布的指令的另一子集。 两个执行流水线也支持执行FFMA指令以及分布式指令集体系结构中的一个常见的指令子集。 当调度所请求的指令时,指令调度单元被配置为基于各种标准在两个执行流水线之间进行选择。 这些标准可以包括能够执行指令的功率效率和执行单元的可用性以支持指令的执行。

    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS
    9.
    发明申请
    PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS 有权
    可编程图形处理程序,用于多方案执行程序

    公开(公告)号:US20140285500A1

    公开(公告)日:2014-09-25

    申请号:US13850175

    申请日:2013-03-25

    CPC classification number: G06T1/20 G06F9/38 G06F9/3851

    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

    Abstract translation: 处理单元包括多个执行流水线,每个执行流水线连接到第一输入部分,用于接收用于像素处理的输入数据和用于接收用于顶点处理的输入数据的第二输入部分和用于存储经处理的像素数据的第一输出部分和 用于存储经处理的顶点数据的第二输出部分。 经处理的顶点数据被光栅化并扫描转换为用作像素处理的输入数据的像素数据。 经处理的像素数据被输出到光栅分析器。

Patent Agency Ranking