Technique for performing variable width data compression using a palette of encodings

    公开(公告)号:US10187663B2

    公开(公告)日:2019-01-22

    申请号:US14831840

    申请日:2015-08-20

    Abstract: A subsystem configured to encode an RGBA8 data stream assembles sequences of four-byte groups from the data stream. The subsystem decorrelates the red and blue channels, and computes a difference between each four-byte group and an anchor value. The anchor is encoded at full value. The subsystem then assigns each group a five-bit header based on the number and location of non-zero bytes and on the data content of the non-zero bytes within the group. The subsystem favors zero valued bytes. Thus, when a group includes only zero valued bytes, the header is sufficient to encode the group; no data bits are necessary. Further, two successive groups of zero-valued bytes may be encoded as a single header with no data bits, achieving further data reduction. Finally, the subsystem concatenates all the headers with associated data to yield the source data stream compressed to some ratio, e.g. four-to-one.

    Stencil then cover path rendering with shared edges

    公开(公告)号:US09418437B2

    公开(公告)日:2016-08-16

    申请号:US14028400

    申请日:2013-09-16

    CPC classification number: G06T7/0079 G06T1/20 G06T1/60 G06T3/0012 G06T11/40

    Abstract: One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges.

    EFFICIENT BINDING OF RESOURCE GROUPS IN A GRAPHICS APPLICATION PROGRAMMING INTERFACE
    13.
    发明申请
    EFFICIENT BINDING OF RESOURCE GROUPS IN A GRAPHICS APPLICATION PROGRAMMING INTERFACE 审中-公开
    资源组合在图形应用编程接口中的有效绑定

    公开(公告)号:US20160111059A1

    公开(公告)日:2016-04-21

    申请号:US14855524

    申请日:2015-09-16

    Inventor: Jeffrey A. Bolz

    CPC classification number: G09G5/001 G06T1/20 G09G5/363 G09G2360/08 G09G2370/10

    Abstract: A method of binding graphics resources is provided that includes: (1) identifying graphics resources for binding, (2) generating a bind group for the graphics resources, (3) organizing the bind group into a bind group memory using a bind group layout and (4) providing bind group control for processing of the bind group. A method of organizing graphics resources and a resource organizing unit are also provided.

    Abstract translation: 提供了一种绑定图形资源的方法,包括:(1)识别用于绑定的图形资源,(2)为图形资源生成绑定组,(3)使用绑定组布局将绑定组组织成绑定组存储器; (4)提供用于绑定组处理的绑定组控制。 还提供了组织图形资源的方法和资源组织单元。

    EFFICIENT SETUP AND EVALUATION OF FILLED CUBIC BEZIER PATHS
    14.
    发明申请
    EFFICIENT SETUP AND EVALUATION OF FILLED CUBIC BEZIER PATHS 有权
    有效设置和评估填充的CUBIC BEZIER PATHS

    公开(公告)号:US20150077420A1

    公开(公告)日:2015-03-19

    申请号:US14028042

    申请日:2013-09-16

    Inventor: Jeffrey A. Bolz

    CPC classification number: G06T11/203

    Abstract: A graphics processing system includes a central processing unit that processes a cubic Bezier curve corresponding to a filled cubic Bezier path. Additionally, the graphics processing system includes a cubic preprocessor coupled to the central processing unit that formats the cubic Bezier curve to provide a formatted cubic Bezier curve having quadrilateral control points corresponding to a mathematically simple cubic curve. The graphics processing system further includes a graphics processing unit coupled to the cubic preprocessor that employs the formatted cubic Bezier curve in rendering the filled cubic Bezier path. A rendering unit and a display cubic Bezier path filling method are also provided.

    Abstract translation: 图形处理系统包括处理与填充的立方贝塞尔路径对应的立方贝塞尔曲线的中央处理单元。 此外,图形处理系统包括耦合到中央处理单元的立方预处理器,其格式化立方贝塞尔曲线以提供具有对应于数学上简单的三次曲线的四边形控制点的格式化的立方贝塞尔曲线。 图形处理系统还包括耦合到立方预处理器的图形处理单元,其采用格式化的立方贝塞尔曲线来渲染填充的立方贝塞尔路径。 还提供了渲染单元和显示立方贝塞尔路径填充方法。

    APPLICATION LOAD TIMES BY CACHING SHADER BINARIES IN A PERSISTENT STORAGE
    15.
    发明申请
    APPLICATION LOAD TIMES BY CACHING SHADER BINARIES IN A PERSISTENT STORAGE 审中-公开
    应用负载时间通过缓存二进制存储在一个持久的存储

    公开(公告)号:US20140043333A1

    公开(公告)日:2014-02-13

    申请号:US13731785

    申请日:2012-12-31

    Abstract: A method for compiling a shader for execution by a graphics processor. The method comprises selecting a shader for execution. A key is computed for the selected shader. A memory is searched for a copy of the computed key. A shader binary stored in the memory is passed to the graphics processor for execution if the copy of the computed key is located in the memory. Otherwise, the shader is compiled to produce the shader binary for execution by the graphics processor and storing the shader binary in the memory. The shader binary is associated with the computed key and the copy of the computed key.

    Abstract translation: 用于编译着色器以供图形处理器执行的方法。 该方法包括选择着色器进行执行。 为所选着色器计算一个键。 搜索存储器的计算密钥的副本。 如果计算的密钥的副本位于存储器中,则存储在存储器中的着色器二进制文件被传递到图形处理器以执行。 否则,着色器被编译以产生着色器二进制,由图形处理器执行并将着色器二进制存储在存储器中。 着色器二进制文件与计算的密钥和计算密钥的副本相关联。

    Controlling multi-pass rendering sequences in a cache tiling architecture

    公开(公告)号:US10535114B2

    公开(公告)日:2020-01-14

    申请号:US14829617

    申请日:2015-08-18

    Inventor: Jeffrey A. Bolz

    Abstract: In one embodiment of the present invention a driver configures a graphics pipeline implemented in a cache tiling architecture to perform dynamically-defined multi-pass rendering sequences. In operation, based on sequence-specific configuration data, the driver determines an optimized tile size and, for each pixel in each pass, the set of pixels in each previous pass that influence the processing of the pixel. The driver then configures the graphics pipeline to perform per-tile rendering operations in a region that is translated by a pass-specific offset backward—vertically and/or horizontally—along a tiled caching traversal line. Notably, the offset ensures that the required pixel data from previous passes is available. The driver further configures the graphics pipeline to store the rendered data in cache lines. Advantageously, the disclosed approach exploits the efficiencies inherent in cache tiling architecture while honoring highly configurable data dependencies between passes in multi-pass rendering sequences.

    Target independent rasterization with multiple color samples

    公开(公告)号:US09767600B2

    公开(公告)日:2017-09-19

    申请号:US14019344

    申请日:2013-09-05

    CPC classification number: G06T15/503 G06T11/203

    Abstract: A graphics processing pipeline within a parallel processing unit (PPU) is configured to perform path rendering by generating a collection of graphics primitives that represent each path to be rendered. The graphics processing pipeline determines the coverage of each primitive at a number of stencil sample locations within each different pixel. Then, the graphics processing pipeline reduces the number of stencil samples down to a smaller number of color samples, for each pixel. The graphics processing pipeline is configured to modulate a given color sample associated with a given pixel based on the color values of any graphics primitives that cover the stencil samples from which the color sample was reduced. The final color of the pixel is determined by downsampling the color samples associated with the pixel.

    Stencil buffer data compression
    19.
    发明授权
    Stencil buffer data compression 有权
    模板缓冲区数据压缩

    公开(公告)号:US09390464B2

    公开(公告)日:2016-07-12

    申请号:US14097124

    申请日:2013-12-04

    CPC classification number: G06T1/60 G06T15/005 H04N19/436 H04N19/593

    Abstract: A raster operations (ROP) unit is configured to compress stencil values included in a stencil buffer. The ROP unit divides the stencil values into groups, subdivides each group into two halves, and selects an anchor value for each half. If the difference between each of the stencil values and the corresponding anchor lies within an offset range, and the difference between the two anchors lies within a delta range, then the group is compressible. For a compressible group, the ROP unit encodes the anchor value, offsets from anchors, and an anchor delta. This encoding enables the ROP unit to operate on the compressed group instead of the uncompressed stencil values, reducing the number of memory and computational operations associated with the stencil values. Consequently, the ROP unit reduces memory bandwidth use, reduces power consumption, and increases rendering rate compared to conventional ROP units that implement less flexible compression techniques.

    Abstract translation: 光栅操作(ROP)单元被配置为压缩包括在模板缓冲器中的模板值。 ROP单元将模板值分成组,将每个组细分为两半,并为每个半部选择一个锚点值。 如果每个模板值和对应的锚点之间的差值在偏移范围内,并且两个锚点之间的差异位于增量范围内,那么该组是可压缩的。 对于可压缩组,ROP单元编码锚点值,与锚点的偏移量以及锚点三角形。 该编码使得ROP单元能够在压缩组而不是未压缩模板值上操作,从而减少与模板值相关联的存储器数量和计算操作。 因此,与实现较不灵活的压缩技术的传统ROP单元相比,ROP单元减少了内存带宽使用,降低了功耗,并提高了渲染速度。

    Work-queue-based graphics processing unit work creation
    20.
    发明授权
    Work-queue-based graphics processing unit work creation 有权
    基于工作队列的图形处理单元工作创建

    公开(公告)号:US09135081B2

    公开(公告)日:2015-09-15

    申请号:US13662274

    申请日:2012-10-26

    CPC classification number: G06F9/52 G06F9/546 G06F2209/548

    Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.

    Abstract translation: 本发明的一个实施例使得在处理器上执行的线程能够通过工作队列和命令块来本地生成和执行该处理器内的工作。 设备驱动程序作为用于建立使线程本地生成和执行工作的内存对象的初始化过程,生成工作队列,并将工作队列的GP_GET指针设置为工作队列中的第一个条目。 在初始化过程中,设备驱动程序还将工作队列的GP_PUT指针设置到工作队列中包含的最后一个空闲条目,从而在工作队列中建立一个可以加载线程生成的新工作的条目范围 并随后由处理器执行。 然后,线程将工作队列中的生成工作和点条目的命令块填充到命令块,以执行存储在命令块中的工作的处理器执行。

Patent Agency Ranking