PIXEL SHADER OUTPUT MAP
    1.
    发明申请
    PIXEL SHADER OUTPUT MAP 有权
    像素阴影输出图

    公开(公告)号:US20110080407A1

    公开(公告)日:2011-04-07

    申请号:US12898998

    申请日:2010-10-06

    IPC分类号: G06T15/80

    CPC分类号: G06T15/005

    摘要: One embodiment of the present invention sets forth a technique for storing only the enabled components for each enabled vector and writing only enabled components to one or more specified render targets. A shader program header (SPH) file provides per-component mask bits for each render target. Each enabled mask bit indicates that the pixel shader generates the corresponding component as an output to the raster operations unit. In the hardware, the per-component mask bits are combined with the applications programming interface (API)-level per-component write masks to determine the components that are updated by the shader program. The combined mask is used as the write enable bits for components in one or more render targets. One advantage of the combined mask is that the components that are not updated are not forwarded from the pixel shader to the ROP, thereby saving bandwidth between those processing units.

    摘要翻译: 本发明的一个实施例提出了一种用于仅存储每个启用向量的启用组件并仅将启用的组件写入一个或多个指定的渲染目标的技术。 着色器程序头(SPH)文件为每个渲染目标提供每个组件掩码位。 每个启用的屏蔽位指示像素着色器生成相应的组件作为光栅操作单元的输出。 在硬件中,每个组件掩码位与应用程序编程接口(API)级的每个组件写入掩码相结合,以确定由着色器程序更新的组件。 组合掩码用作一个或多个渲染目标中的组件的写使能位。 组合掩码的一个优点是未更新的组件不会从像素着色器转发到ROP,从而节省了这些处理单元之间的带宽。

    Hardware For Parallel Command List Generation
    2.
    发明申请
    Hardware For Parallel Command List Generation 审中-公开
    用于并行命令列表生成的硬件

    公开(公告)号:US20110072211A1

    公开(公告)日:2011-03-24

    申请号:US12853161

    申请日:2010-08-09

    IPC分类号: G06F9/38 G06T1/00 G06F12/08

    CPC分类号: G06F9/461 G06F12/00

    摘要: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.

    摘要翻译: 一种在多线程处理环境中通过命令列表提供状态继承的方法。 该方法包括接收包括多个并行线程的应用程序; 生成所述多个并行线程中的每个线程的命令列表; 使与所述多个并行线程中的第一线程相关联的第一命令列表由处理单元执行; 并且使得与所述多个并行线程的第二线程相关联的第二命令列表由所述处理单元执行,其中所述第二命令列表从与所述处理单元相关联的所述第一命令列表状态中继承。

    METHODS AND APPARATUS FOR AUTO-THROTTLING ENCAPSULATED COMPUTE TASKS
    3.
    发明申请
    METHODS AND APPARATUS FOR AUTO-THROTTLING ENCAPSULATED COMPUTE TASKS 有权
    用于自动曲轴加密计算机任务的方法和装置

    公开(公告)号:US20130268942A1

    公开(公告)日:2013-10-10

    申请号:US13442730

    申请日:2012-04-09

    IPC分类号: G06F9/46

    摘要: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.

    摘要翻译: 自动调节封装计算任务的系统和方法。 设备驱动器可以配置并行处理器来执行多个离散节流模式中的计算任务。 设备驱动器还可以以非节流模式将存储器分配给多个不同的处理单元。 在每个节流模式中,设备驱动器还可以向多个处理单元的子集分配存储器。 为每个任务定义的数据结构包括一个标志,指示处理单元是否可以在非节流模式或节流模式下执行任务。 工作分配单元监视计划在多个处理单元上运行的任务,并且确定处理器是否应被配置为以节流模式或非节流模式运行。

    PROVIDING PIPELINE STATE THROUGH CONSTANT BUFFERS
    4.
    发明申请
    PROVIDING PIPELINE STATE THROUGH CONSTANT BUFFERS 有权
    通过持续缓冲提供管道状态

    公开(公告)号:US20110087864A1

    公开(公告)日:2011-04-14

    申请号:US12899454

    申请日:2010-10-06

    IPC分类号: G06F9/38

    摘要: One embodiment of the present invention sets forth a technique for providing state information to one or more shader engines within a processing pipeline. State information received from an application accessing the processing pipeline is stored in constant buffer memory accessible to each of the shader engines. The shader engines can then retrieve the state information during execution.

    摘要翻译: 本发明的一个实施例提出了一种用于向处理流水线内的一个或多个着色引擎提供状态信息的技术。 从访问处理流水线的应用程序接收到的状态信息存储在每个着色引擎可访问的恒定缓冲存储器中。 着色器引擎可以在执行期间检索状态信息。

    HARDWARE FOR PARALLEL COMMAND LIST GENERATION
    5.
    发明申请
    HARDWARE FOR PARALLEL COMMAND LIST GENERATION 审中-公开
    并行命令列表生成的硬件

    公开(公告)号:US20110072245A1

    公开(公告)日:2011-03-24

    申请号:US12868596

    申请日:2010-08-25

    IPC分类号: G06F9/38

    CPC分类号: G06F9/461 G06F12/00

    摘要: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.

    摘要翻译: 一种在多线程处理环境中通过命令列表提供状态继承的方法。 该方法包括接收包括多个并行线程的应用程序; 生成所述多个并行线程中的每个线程的命令列表; 使与所述多个并行线程中的第一线程相关联的第一命令列表由处理单元执行; 并且使得与所述多个并行线程的第二线程相关联的第二命令列表由所述处理单元执行,其中所述第二命令列表从与所述处理单元相关联的所述第一命令列表状态中继承。

    Draw Commands With Built-In Begin/End
    6.
    发明申请
    Draw Commands With Built-In Begin/End 有权
    绘制命令与内置开始/结束

    公开(公告)号:US20110084975A1

    公开(公告)日:2011-04-14

    申请号:US12893617

    申请日:2010-09-29

    IPC分类号: G06T1/00

    摘要: One embodiment of the present invention sets forth a technique for reducing the overhead for transmitting explicit begin and explicit end commands that are needed in primitive draw command sequences. A draw method includes a header to specify an implicit begin command, an implicit end command, and instancing information for a primitive draw command sequence. The header is followed by a packet including one or more data words (dwords) that each specify a primitive topology, starting offset into a vertex or index buffer, and vertex or index count. Only a single clock cycle is consumed to transmit and process the header. The performance of graphics application programs that have many small batches of geometry (as is typical of many workstation applications) may be improved since the overhead of transmitting and processing the explicit begin and explicit end draw commands is reduced.

    摘要翻译: 本发明的一个实施例提出了用于减少用于发送原始绘制命令序列中所需的显式开始和显式终止命令的开销的技术。 绘制方法包括用于指定隐式开始命令的头部,隐式结束命令和用于原始绘制命令序列的实例化信息。 标题之后是包括一个或多个数据字(dwords)的数据包,每个数据字指定原始拓扑,将偏移开始到顶点或索引缓冲区,以及顶点或索引计数。 仅消耗一个时钟周期来传输和处理标题。 可以改进具有许多小批量几何图形应用程序的性能(如许多工作站应用程序的典型),因为减少了显式开始和显式结束绘制命令的传输和处理的开销。

    INTER-SHADER ATTRIBUTE BUFFER OPTIMIZATION
    7.
    发明申请
    INTER-SHADER ATTRIBUTE BUFFER OPTIMIZATION 有权
    INTER-SHADER属性缓存优化

    公开(公告)号:US20110080415A1

    公开(公告)日:2011-04-07

    申请号:US12895579

    申请日:2010-09-30

    IPC分类号: G06T1/20

    摘要: One embodiment of the present invention sets forth a technique for reducing the amount of memory required to store vertex data processed within a processing pipeline that includes a plurality of shading engines. The method includes determining a first active shading engine and a second active shading engine included within the processing pipeline, wherein the second active shading engine receives vertex data output by the first active shading engine. An output map is received and indicates one or more attributes that are included in the vertex data and output by the first active shading engine. An input map is received and indicates one or more attributes that are included in the vertex data and received by the second active shading engine from the first active shading engine. Then, a buffer map is generated based on the input map, the output map, and a pre-defined set of rules that includes rule data associated with both the first shading engine and the second shading engine, wherein the buffer map indicates one or more attributes that are included in the vertex data and stored in a memory that is accessible by both the first active shading engine and the second active shading engine.

    摘要翻译: 本发明的一个实施例提出了一种用于减少存储在包括多个着色引擎的处理流水线中处理的顶点数据所需的存储量的技术。 该方法包括确定包括在处理流水线内的第一主动着色引擎和第二主动着色引擎,其中第二主动着色引擎接收由第一主动着色引擎输出的顶点数据。 接收输出图,并指示包含在顶点数据中并由第一主动着色引擎输出的一个或多个属性。 接收输入图,并且指示包括在顶点数据中并由第二主动着色引擎从第一主动着色引擎接收的一个或多个属性。 然后,基于输入映射,输出映射和包括与第一着色引擎和第二着色引擎相关联的规则数据的预定义规则集来生成缓冲器映射,其中缓冲器映射指示一个或多个 包括在顶点数据中并存储在可由第一主动着色引擎和第二主动着色引擎访问的存储器中的属性。

    COMPUTE TASK STATE ENCAPSULATION
    8.
    发明申请
    COMPUTE TASK STATE ENCAPSULATION 审中-公开
    计算机任务状态包络

    公开(公告)号:US20130117751A1

    公开(公告)日:2013-05-09

    申请号:US13292951

    申请日:2011-11-09

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.

    摘要翻译: 本发明的一个实施例提出了一种用于封装计算任务状态的技术,该计算任务状态实现计算任务的无序调度和执行。 调度电路基于优先级将计算任务组织成组。 然后可以使用不同的调度方案来选择计算任务来执行。 维护每个组作为指针的链接列表,以计算任务被编码为存储在存储器中的任务元数据(TMD)。 TMD封装了初始化,调度和执行计算任务所需的状态和参数。

    RESTART INDEX THAT SETS A TOPOLOGY
    9.
    发明申请
    RESTART INDEX THAT SETS A TOPOLOGY 有权
    重新组建一个拓扑学指标

    公开(公告)号:US20110109638A1

    公开(公告)日:2011-05-12

    申请号:US12897622

    申请日:2010-10-04

    IPC分类号: G06T1/00

    摘要: One embodiment of the present invention sets forth a technique for reducing overhead associated with transmitting primitive draw commands from memory to a graphics processing unit (GPU). Command pairs comprising an end draw command and a begin draw command associated with a conventional graphics application programming interface (API) are selectively replaced with a new construct. The new construct is a reset topology index, which implements a combined function of the end draw command and begin draw command. The new construct improves efficiency by reducing total data transmitted from memory to the GPU.

    摘要翻译: 本发明的一个实施例提出了一种用于减少与将图形绘制命令从存储器传输到图形处理单元(GPU)相关联的开销的技术。 包括与传统图形应用编程接口(API)相关联的结束绘制命令和开始绘制命令的命令对被选择性地替换为新构造。 新构造是复位拓扑索引,它实现了结束绘制命令和开始绘制命令的组合功能。 新的结构通过减少从内存传输到GPU的总数据来提高效率。

    Multi-Channel Time Slice Groups
    10.
    发明申请
    Multi-Channel Time Slice Groups 有权
    多通道时间片组

    公开(公告)号:US20130152093A1

    公开(公告)日:2013-06-13

    申请号:US13316334

    申请日:2011-12-09

    IPC分类号: G06F9/46

    摘要: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

    摘要翻译: 时间片组(TSG)是共享相同上下文信息的不同工作流(本文称为“信道”)的分组。 属于TSG的信道集合以预定的顺序被处理。 然而,当通道在处理过程中停顿时,可以切换具有独立工作的下一个通道来完全加载并行处理单元。 重要的是,由于TSG中的每个信道共享相同的上下文信息,当TSG中的特定信道的处理停止并且TSG中的下一个信道的处理开始时,不需要上下文切换操作。 因此,允许多个独立的工作流在单个上下文中同时运行,增加并行处理单元的利用率。