Methods and apparatus for source operand collector caching
    1.
    发明授权
    Methods and apparatus for source operand collector caching 有权
    源操作数采集器缓存的方法和装置

    公开(公告)号:US08639882B2

    公开(公告)日:2014-01-28

    申请号:US13326183

    申请日:2011-12-14

    IPC分类号: G06F12/00

    摘要: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

    摘要翻译: 源操作数采集器缓存的方法和装置。 在一个实施例中,处理器包括可以耦合到存储元件(即,操作数收集器)的寄存器文件,其提供用于执行指令的处理器核的数据路径的输入。 为了减少寄存器文件和操作数收集器之间的带宽,操作数可以在随后的指令中缓存并重新使用。 调度单元维护高速缓存表,用于监视当前存储在操作数收集器中的寄存器值。 调度单元还可以配置操作数收集器以选择耦合到给定指令的数据路径的输入的特定存储元件。

    METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING
    2.
    发明申请
    METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING 有权
    来源操作收集器缓存的方法和装置

    公开(公告)号:US20130159628A1

    公开(公告)日:2013-06-20

    申请号:US13326183

    申请日:2011-12-14

    IPC分类号: G06F12/08

    摘要: Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.

    摘要翻译: 源操作数采集器缓存的方法和装置。 在一个实施例中,处理器包括可以耦合到存储元件(即,操作数收集器)的寄存器文件,其提供用于执行指令的处理器核的数据路径的输入。 为了减少寄存器文件和操作数收集器之间的带宽,操作数可以在随后的指令中缓存并重新使用。 调度单元维护高速缓存表,用于监视当前存储在操作数收集器中的寄存器值。 调度单元还可以配置操作数收集器以选择耦合到给定指令的数据路径的输入的特定存储元件。

    Thread group scheduler for computing on a parallel thread processor
    4.
    发明授权
    Thread group scheduler for computing on a parallel thread processor 有权
    线程组调度程序,用于在并行线程处理器上进行计算

    公开(公告)号:US08732713B2

    公开(公告)日:2014-05-20

    申请号:US13247819

    申请日:2011-09-28

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A parallel thread processor executes thread groups belonging to multiple cooperative thread arrays (CTAs). At each cycle of the parallel thread processor, an instruction scheduler selects a thread group to be issued for execution during a subsequent cycle. The instruction scheduler selects a thread group to issue for execution by (i) identifying a pool of available thread groups, (ii) identifying a CTA that has the greatest seniority value, and (iii) selecting the thread group that has the greatest credit value from within the CTA with the greatest seniority value.

    摘要翻译: 并行线程处理器执行属于多个协作线程数组(CTA)的线程组。 在并行线程处理器的每个周期,指令调度器在随后的周期中选择要发行的线程组以执行。 指令调度器通过(i)识别可用线程组的池,(ii)识别具有最大资历值的CTA来选择要执行的线程组,以及(iii)选择具有最大信用值的线程组 从具有最高资历价值的CTA内。

    Architecture and instructions for accessing multi-dimensional formatted surface memory
    5.
    发明授权
    Architecture and instructions for accessing multi-dimensional formatted surface memory 有权
    用于访问多维格式化表面存储器的体系结构和指令

    公开(公告)号:US09519947B2

    公开(公告)日:2016-12-13

    申请号:US12890171

    申请日:2010-09-24

    IPC分类号: G06F12/00 G06T1/60

    CPC分类号: G06T1/60

    摘要: One embodiment of the present invention sets forth a technique for a program to access multi-dimensional formatted graphics surface memory. Multi-dimensional memory objects called “surfaces” stored in a user-specified data or pixel format and arranged in a graphics optimized layout are accessed by programs using surface instructions. A set of memory access instructions e.g., load, store, reduce, and atomic, referred to as surface instructions, may be used to access the surfaces. Coordinate bounds checking is performed with configurable clamping. Caching behavior may also be specified by the surface instructions. Data format conversion and packing to a specified storage format is supported for store, reduction, and atomic surface instructions. Data format conversion and unpacking from a specified storage format is supported for loads and atomic surface instructions.

    摘要翻译: 本发明的一个实施例提出了一种用于访问多维格式化图形表面存储器的程序的技术。 称为“表面”的多维存储器对象以用户指定的数据或像素格式存储并以图形优化的布局布置,由使用表面指令的程序访问。 可以使用一组存储器访问指令,例如加载,存储,减少和原子,称为表面指令,以访问表面。 通过可配置的夹紧进行坐标界限检查。 缓存行为也可以由表面指令指定。 支持存储,缩小和原子表面指令的数据格式转换和打包到指定的存储格式。 负载和原子表面指令支持从指定的存储格式进行数据格式转换和解包。

    Using condition codes in the presence of non-numeric values
    6.
    发明授权
    Using condition codes in the presence of non-numeric values 有权
    在非数值存在的情况下使用条件代码

    公开(公告)号:US09195460B1

    公开(公告)日:2015-11-24

    申请号:US11415781

    申请日:2006-05-02

    IPC分类号: G06F9/45 G06F9/30

    CPC分类号: G06F9/30094 G06F9/30058

    摘要: Systems and methods for compiling programs using condition codes and executing those programs when non-numeric values are present allow for explicit handling of non-numeric values. In addition to the conventional condition code values of positive, negative, and zero, a fourth value may be encoded, not a number (NaN) representing a non-numeric value. New condition tests are defined that explicitly account for condition code values of NaN. A compiler may produce code using the new condition tests to represent if and if-else statements. The code including the new condition tests generates deterministic results during execution when non-numeric values are present.

    摘要翻译: 使用条件代码编译程序的系统和方法,并且当存在非数字值时执行这些程序允许显式处理非数字值。 除了正,负和零的常规条件代码值之外,第四值可以被编码,而不是表示非数值的数字(NaN)。 定义新条件测试,明确说明NaN的条件代码值。 编译器可以使用新的条件测试来生成代码来表示if和if-else语句。 包含新条件测试的代码在执行时产生确定性结果,当非数值存在时。

    Using a pixel offset for evaluating a plane equation
    7.
    发明授权
    Using a pixel offset for evaluating a plane equation 有权
    使用像素偏移来评估平面方程

    公开(公告)号:US09058672B2

    公开(公告)日:2015-06-16

    申请号:US12898537

    申请日:2010-10-05

    IPC分类号: G06K9/32 G06T3/40

    CPC分类号: G06T3/4007

    摘要: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

    摘要翻译: 本发明的一个实施例提出了一种控制平面方程被评估的像素位置的技术。 可以指定多个像素偏移(dx,dy),每个像素偏移定义为子像素采样位置。 然后对由几何图元覆盖的每个子像素样本位置计算属性。 该技术的一个优点是可以改善抗混叠质量,因为可以对特定几何基元选择性地超采样高频彩色分量。

    Generating clip state for a batch of vertices
    8.
    发明授权
    Generating clip state for a batch of vertices 有权
    为一批顶点生成剪辑状态

    公开(公告)号:US08976195B1

    公开(公告)日:2015-03-10

    申请号:US12579352

    申请日:2009-10-14

    IPC分类号: G09G5/00 G06T15/30

    CPC分类号: G06T15/30

    摘要: One embodiment of the present invention sets forth a technique for generating a batch clip state stored in clip state machine (CSM) associated with a batch of vertices. Per-vertex clip state is generated for each vertex in the batch of vertices based on the position of each vertex relative to each clip plane. For a given vertex, per-vertex clip state indicates whether the vertex is inside or outside each of the one or more clip planes. The per-vertex clip states of all the vertices in the batch of vertices are coalesced into a batch clip state by determining whether each vertex in the batch of vertices is inside every clip plane, each vertex is outside at least one clip plane or neither. The batch clip state is stored in the CSM associated with the thread group that processes the batch of vertices that can be accessed by further stages of the graphics pipeline.

    摘要翻译: 本发明的一个实施例提出了一种用于生成与一批顶点相关联的剪辑状态机(CSM)中存储的批次剪辑状态的技术。 基于每个顶点相对于每个剪切平面的位置,为顶点批次中的每个顶点生成每顶点剪辑状态。 对于给定的顶点,每顶点剪辑状态指示顶点是在每个一个或多个剪辑平面的内部还是外部。 通过确定顶点批次中的每个顶点是否位于每个剪切平面内,每个顶点至少在一个剪切平面之外,或者两个顶点都在外部至少一个剪切平面,否则将批次顶点中的所有顶点的每顶点剪辑状态合并为批次剪辑状态。 批处理剪辑状态存储在与线程组相关联的CSM中,处理可以通过图形流水线的更多阶段访问的顶点批次。

    Cull before vertex attribute fetch and vertex lighting
    9.
    发明授权
    Cull before vertex attribute fetch and vertex lighting 有权
    在顶点属性获取和顶点照明之前进行Cull

    公开(公告)号:US08564616B1

    公开(公告)日:2013-10-22

    申请号:US12505402

    申请日:2009-07-17

    IPC分类号: G09G5/00

    摘要: One embodiment of the invention sets forth a mechanism for compiling a vertex shader program into two portions, a culling portion and a shading portion. The culling portion of the compiled vertex shader program specifies vertex attributes and instructions of the vertex shader program needed to determine whether early vertex culling operations should be performed on a batch of vertices associated with one or more primitives of a graphics scene. The shading portion of the compiled vertex shader program specifies the remaining vertex attributes and instructions of the vertex shader program for performing vertex lighting and performing other operations on the vertices in the batch of vertices. When the compiled vertex shader program is executed by graphics processing hardware, the shading portion of the compiled vertex shader is executed only when early vertex culling operations are not performed on the batch of vertices.

    摘要翻译: 本发明的一个实施例提出了一种用于将顶点着色器程序编译成两部分,一个剔除部分和一个阴影部分的机构。 编译的顶点着色器程序的剔除部分指定顶点着色器程序的顶点属性和指令,以确定是否应对与图形场景的一个或多个图元相关联的一批顶点执行早期顶点剔除操作。 编译顶点着色器程序的阴影部分指定顶点着色器程序的剩余顶点属性和指令,用于执行顶点点亮,并对顶点的顶点中的顶点执行其他操作。 当编译的顶点着色器程序由图形处理硬件执行时,只有在不对顶点顶点执行早期顶点剔除操作时才执行编译顶点着色器的阴影部分。

    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict
    10.
    发明授权
    Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict 有权
    对分配给每个读取请求端口的操作数重新排序并发访问多银行寄存器文件以避免银行冲突

    公开(公告)号:US08533435B2

    公开(公告)日:2013-09-10

    申请号:US12875843

    申请日:2010-09-03

    IPC分类号: G06F9/34

    摘要: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

    摘要翻译: 本发明的一个实施例提出了一种用于收集由指令指定的操作数的技术。 由于接收到指令序列,指令指定的操作数被分配给端口,以便将由单个指令指定的每个操作数分配给不同的端口。 通过从不同端口中的每一个选择一个操作数来调度来自多存储器寄存器文件的操作数,以产生操作数读取请求,并确保所选择的操作数中的两个或更多个不存储在多个存储区的同一个存储区中 银行寄存器文件。 由操作数读取请求指定的操作数在单个时钟周期内从多存储体寄存器文件读取。 然后由指令指定的操作数从多存储寄存器文件中读取并在一个或多个时钟周期内采集,执行每条指令。