CONTROLLING MULTI-PASS RENDERING SEQUENCES IN A CACHE TILING ARCHITECTURE
    1.
    发明申请
    CONTROLLING MULTI-PASS RENDERING SEQUENCES IN A CACHE TILING ARCHITECTURE 审中-公开
    控制高速缓存架构中的多通行渲染序列

    公开(公告)号:US20170053375A1

    公开(公告)日:2017-02-23

    申请号:US14829617

    申请日:2015-08-18

    Inventor: Jeffrey A. BOLZ

    Abstract: In one embodiment of the present invention a driver configures a graphics pipeline implemented in a cache tiling architecture to perform dynamically-defined multi-pass rendering sequences. In operation, based on sequence-specific configuration data, the driver determines an optimized tile size and, for each pixel in each pass, the set of pixels in each previous pass that influence the processing of the pixel. The driver then configures the graphics pipeline to perform per-tile rendering operations in a region that is translated by a pass-specific offset backward—vertically and/or horizontally—along a tiled caching traversal line. Notably, the offset ensures that the required pixel data from previous passes is available. The driver further configures the graphics pipeline to store the rendered data in cache lines. Advantageously, the disclosed approach exploits the efficiencies inherent in cache tiling architecture while honoring highly configurable data dependencies between passes in multi-pass rendering sequences.

    Abstract translation: 在本发明的一个实施例中,驱动程序配置在高速缓存分块架构中实现的图形流水线,以执行动态定义的多遍渲染序列。 在操作中,基于特定于序列的配置数据,驱动器确定优化的瓦片尺寸,并且对于每个遍历中的每个像素,每个先前的遍历中的像素集合影响像素的处理。 然后,驱动程序配置图形流水线,以在按平铺的高速缓存遍历行向后垂直和/或水平方向通过特定偏移量翻译的区域中执行每个图块渲染操作。 值得注意的是,偏移确保了来自前一遍的所需像素数据可用。 驱动程序进一步配置图形流水线以将渲染的数据存储在高速缓存行中。 有利地,所公开的方法利用高速缓存平铺架构中固有的效率,同时尊重多遍渲染序列中的遍次之间的高度可配置的数据依赖性。

    STENCIL THEN COVER PATH RENDERING WITH SHARED EDGES

    公开(公告)号:US20140267373A1

    公开(公告)日:2014-09-18

    申请号:US14028393

    申请日:2013-09-16

    CPC classification number: G06T7/0079 G06T1/20 G06T1/60 G06T3/0012 G06T11/40

    Abstract: One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges.

    TECHNIQUE FOR PERFORMING VARIABLE WIDTH DATA COMPRESSION USING A PALETTE OF ENCODINGS
    3.
    发明申请
    TECHNIQUE FOR PERFORMING VARIABLE WIDTH DATA COMPRESSION USING A PALETTE OF ENCODINGS 审中-公开
    使用编码的PALETTE执行可变宽度数据压缩的技术

    公开(公告)号:US20170053376A1

    公开(公告)日:2017-02-23

    申请号:US14831840

    申请日:2015-08-20

    CPC classification number: H04N19/91 G06T9/005 H04N19/93

    Abstract: A subsystem configured to encode an RGBA8 data stream assembles sequences of four-byte groups from the data stream. The subsystem decorrelates the red and blue channels, and computes a difference between each four-byte group and an anchor value. The anchor is encoded at full value. The subsystem then assigns each group a five-bit header based on the number and location of non-zero bytes and on the data content of the non-zero bytes within the group. The subsystem favors zero valued bytes. Thus, when a group includes only zero valued bytes, the header is sufficient to encode the group; no data bits are necessary. Further, two successive groups of zero-valued bytes may be encoded as a single header with no data bits, achieving further data reduction. Finally, the subsystem concatenates all the headers with associated data to yield the source data stream compressed to some ratio, e.g. four-to-one.

    Abstract translation: 配置为对RGBA8数据流进行编码的子系统从数据流汇编四字节组的序列。 子系统将红色和蓝色通道相关联,并计算每个四字节组与锚值之间的差异。 锚被编码为满值。 子系统然后根据非零字节的数量和位置以及组内非零字节的数据内容为每个组分配一个五位头。 子系统有利于零值字节。 因此,当组仅包括零值字节时,头部足以对组进行编码; 不需要数据位。 此外,两个连续的零值字节组可以被编码为没有数据位的单个报头,实现进一步的数据减少。 最后,子系统将所有头部连接到相关联的数据,以产生压缩到一定比例的源数据流。 四对一。

    STENCIL-THEN-COVER PATH RENDERING WITH SHARED EDGES
    4.
    发明申请
    STENCIL-THEN-COVER PATH RENDERING WITH SHARED EDGES 审中-公开
    STENCIL-THEN-COVER路径渲染与共享边缘

    公开(公告)号:US20170024897A1

    公开(公告)日:2017-01-26

    申请号:US15289694

    申请日:2016-10-10

    CPC classification number: G06T7/0079 G06T1/20 G06T1/60 G06T3/0012 G06T11/40

    Abstract: One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges.

    Abstract translation: 本发明的一个实施例包括用于光栅化包括在路径之间共享的边缘的图元的技术。 对于每个边缘,光栅化器单元从多个样本规则中选择并应用样本规则。 如果边缘是共享的,则所选择的样本规则使得与单个颜色样本相关联的每组覆盖样本被视为完全在边缘内部或完全在边缘外部。 因此,当每像素的覆盖样本的数量超过每像素的颜色样本的数量时,引起的接合伪影可能被减少。 在现有技术中,减少这种接合伪影通常涉及增加每像素的颜色样本的数量,以等于每像素的覆盖样本的数量。 有利地,所公开的技术使得能够使用降低颜色与覆盖样本的比率的算法来渲染,从而减少存储器消耗和存储器带宽使用,而不会引起与共享边缘相关联的混淆伪像。

    STENCIL BUFFER DATA COMPRESSION
    5.
    发明申请
    STENCIL BUFFER DATA COMPRESSION 有权
    STENCIL缓冲器数据压缩

    公开(公告)号:US20150154733A1

    公开(公告)日:2015-06-04

    申请号:US14097124

    申请日:2013-12-04

    CPC classification number: G06T1/60 G06T15/005 H04N19/436 H04N19/593

    Abstract: A raster operations (ROP) unit is configured to compress stencil values included in a stencil buffer. The ROP unit divides the stencil values into groups, subdivides each group into two halves, and selects an anchor value for each half. If the difference between each of the stencil values and the corresponding anchor lies within an offset range, and the difference between the two anchors lies within a delta range, then the group is compressible. For a compressible group, the ROP unit encodes the anchor value, offsets from anchors, and an anchor delta. This encoding enables the ROP unit to operate on the compressed group instead of the uncompressed stencil values, reducing the number of memory and computational operations associated with the stencil values. Consequently, the ROP unit reduces memory bandwidth use, reduces power consumption, and increases rendering rate compared to conventional ROP units that implement less flexible compression techniques.

    Abstract translation: 光栅操作(ROP)单元被配置为压缩包括在模板缓冲器中的模板值。 ROP单元将模板值分成组,将每个组细分为两半,并为每个半部选择一个锚点值。 如果每个模板值和对应的锚点之间的差值在偏移范围内,并且两个锚点之间的差异位于增量范围内,那么该组是可压缩的。 对于可压缩组,ROP单元编码锚点值,与锚点的偏移量以及锚点三角形。 该编码使得ROP单元能够在压缩组而不是未压缩模板值上操作,从而减少与模板值相关联的存储器数量和计算操作。 因此,与实现较不灵活的压缩技术的传统ROP单元相比,ROP单元减少了内存带宽使用,降低了功耗,并提高了渲染速度。

    RENDERING COVER GEOMETRY WITHOUT INTERNAL EDGES
    6.
    发明申请
    RENDERING COVER GEOMETRY WITHOUT INTERNAL EDGES 有权
    没有内部边缘的渲染覆盖几何

    公开(公告)号:US20140267386A1

    公开(公告)日:2014-09-18

    申请号:US13971639

    申请日:2013-08-20

    CPC classification number: G06T15/30

    Abstract: One embodiment of the present invention includes techniques for rasterizing geometries. First, a processing unit defines a bounding primitive that covers the geometry and does not include any internal edges. If the bounding primitive intersects any enabled clip plane, then the processing unit generates fragments to fill a current viewport. Alternatively, the processing unit generates fragments to fill the bounding primitive. Because the rasterized region includes no internal edges, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Consequently, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with cover geometries.

    Abstract translation: 本发明的一个实施例包括用于光栅化几何的技术。 首先,处理单元定义了覆盖几何形状并且不包括任何内部边缘的边界原语。 如果边界原语与任何启用的剪切平面相交,则处理单元生成碎片以填充当前视口。 或者,处理单元生成用于填充边界原语的片段。 由于光栅化区域不包含内部边缘,所以当每个像素的覆盖样本数超过每像素的颜色样本数时,会引起混淆伪影。 在现有技术中,减少这种接合伪影通常涉及增加每像素的颜色样本的数量,以等于每像素的覆盖样本的数量。 因此,所公开的技术使得能够使用降低颜色与覆盖样本的比率的算法进行渲染,从而减少存储器消耗和存储器带宽使用,而不会引起与盖几何形状相关联的接合伪像。

    LOAD/STORE OPERATIONS IN TEXTURE HARDWARE
    7.
    发明申请
    LOAD/STORE OPERATIONS IN TEXTURE HARDWARE 有权
    纹理硬件中的装载/存储操作

    公开(公告)号:US20150084975A1

    公开(公告)日:2015-03-26

    申请号:US14038599

    申请日:2013-09-26

    CPC classification number: G06T1/60 G06F2212/302 G06T1/20 G06T15/04 G09G5/363

    Abstract: Approaches are disclosed for performing memory access operations in a texture processing pipeline having a first portion configured to process texture memory access operations and a second portion configured to process non-texture memory access operations. A texture unit receives a memory access request. The texture unit determines whether the memory access request includes a texture memory access operation. If the memory access request includes a texture memory access operation, then the texture unit processes the memory access request via at least the first portion of the texture processing pipeline, otherwise, the texture unit processes the memory access request via at least the second portion of the texture processing pipeline. One advantage of the disclosed approach is that the same processing and cache memory may be used for both texture operations and load/store operations to various other address spaces, leading to reduced surface area and power consumption.

    Abstract translation: 公开了用于在具有被配置为处理纹理存储器访问操作的第一部分的纹理处理流水线中执行存储器访问操作的方法和被配置为处理非纹理存储器访问操作的第二部分。 纹理单元接收存储器访问请求。 纹理单元确定存储器访问请求是否包括纹理存储器访问操作。 如果存储器访问请求包括纹理存储器访问操作,则纹理单元至少通过纹理处理流水线的第一部分来处理存储器访问请求,否则,纹理单元至少经由第二部分处理存储器访问请求 纹理处理流水线。 所公开方法的一个优点是可以将相同的处理和高速缓冲存储器用于纹理操作和对各种其他地址空间的加载/存储操作,导致减小的表面积和功率消耗。

    STENCIL THEN COVER PATH RENDERING WITH SHARED EDGES

    公开(公告)号:US20140267374A1

    公开(公告)日:2014-09-18

    申请号:US14028400

    申请日:2013-09-16

    CPC classification number: G06T7/0079 G06T1/20 G06T1/60 G06T3/0012 G06T11/40

    Abstract: One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges.

    TARGET INDEPENDENT RASTERIZATION WITH MULTIPLE COLOR SAMPLES
    9.
    发明申请
    TARGET INDEPENDENT RASTERIZATION WITH MULTIPLE COLOR SAMPLES 有权
    具有多种颜色样本的目标独立分辨率

    公开(公告)号:US20140267366A1

    公开(公告)日:2014-09-18

    申请号:US14019344

    申请日:2013-09-05

    CPC classification number: G06T15/503 G06T11/203

    Abstract: A graphics processing pipeline within a parallel processing unit (PPU) is configured to perform path rendering by generating a collection of graphics primitives that represent each path to be rendered. The graphics processing pipeline determines the coverage of each primitive at a number of stencil sample locations within each different pixel. Then, the graphics processing pipeline reduces the number of stencil samples down to a smaller number of color samples, for each pixel. The graphics processing pipeline is configured to modulate a given color sample associated with a given pixel based on the color values of any graphics primitives that cover the stencil samples from which the color sample was reduced. The final color of the pixel is determined by downsampling the color samples associated with the pixel.

    Abstract translation: 并行处理单元(PPU)中的图形处理流水线被配置为通过生成表示要渲染的每个路径的图形基元的集合来执行路径渲染。 图形处理流水线确定每个不同像素内的多个模版样本位置上每个图元的覆盖范围。 然后,对于每个像素,图形处理管线将模板样本的数量减少到较少数量的颜色样本。 图形处理流水线被配置为基于覆盖颜色样本从其降低的模板样本的任何图形图元的颜色值来调制与给定像素相关联的给定颜色样本。 通过对与像素相关联的颜色样本进行下采样来确定像素的最终颜色。

    WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION
    10.
    发明申请
    WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION 有权
    基于工作队列的图形处理单元创作

    公开(公告)号:US20140123144A1

    公开(公告)日:2014-05-01

    申请号:US13662274

    申请日:2012-10-26

    CPC classification number: G06F9/52 G06F9/546 G06F2209/548

    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

    Abstract translation: 本发明的一个实施例使得在处理器上执行的线程能够通过工作队列和命令块来本地生成和执行该处理器内的工作。 设备驱动程序作为用于建立使线程本地生成和执行工作的内存对象的初始化过程,生成工作队列,并将工作队列的GP_GET指针设置为工作队列中的第一个条目。 在初始化过程中,设备驱动程序还将工作队列的GP_PUT指针设置到工作队列中包含的最后一个空闲条目,从而在工作队列中建立一个可以加载线程生成的新工作的条目范围 并随后由处理器执行。 然后,线程将工作队列中的生成工作和点条目的命令块填充到命令块,以执行存储在命令块中的工作的处理器执行。

Patent Agency Ranking