Graphics system with configurable caches
    91.
    发明授权
    Graphics system with configurable caches 有权
    具有可配置缓存的图形系统

    公开(公告)号:US08766995B2

    公开(公告)日:2014-07-01

    申请号:US11412678

    申请日:2006-04-26

    IPC分类号: G09G5/36 G06T1/20

    CPC分类号: G06T1/60 G06T15/005

    摘要: A graphics system includes a graphics processor and a cache memory system. The graphics processor includes processing units that perform various graphics operations to render graphics images. The cache memory system may include fully configurable caches, partially configurable caches, or a combination of configurable and dedicated caches. The cache memory system may further include a control unit, a crossbar, and an arbiter. The control unit may determine memory utilization by the processing units and assign the configurable caches to the processing units based on memory utilization. The configurable caches may be assigned to achieve good utilization of these caches and to avoid memory access bottleneck. The crossbar couples the processing units to their assigned caches. The arbiter facilitates data exchanges between the caches and a main memory.

    摘要翻译: 图形系统包括图形处理器和高速缓冲存储器系统。 图形处理器包括执行各种图形操作以渲染图形图像的处理单元。 高速缓冲存储器系统可以包括完全可配置的高速缓存,部分可配置的高速缓存,或可配置和专用高速缓存的组合。 高速缓存存储器系统还可以包括控制单元,交叉开关和仲裁器。 控制单元可以确定处理单元的存储器利用率,并且基于存储器利用率将配置的高速缓存分配给处理单元。 可以分配可配置的高速缓存以实现这些高速缓存的良好利用并避免存储器访问瓶颈。 交叉开关将处理单元耦合到其分配的高速缓存。 仲裁器便于缓存和主存储器之间的数据交换。

    Convolution filtering in a graphics processor
    92.
    发明授权
    Convolution filtering in a graphics processor 有权
    图形处理器中的卷积滤波

    公开(公告)号:US08644643B2

    公开(公告)日:2014-02-04

    申请号:US11453436

    申请日:2006-06-14

    IPC分类号: G06K9/40

    CPC分类号: G06F17/153 G06T5/20 G06T15/04

    摘要: Techniques for performing convolution filtering using hardware normally available in a graphics processor are described. Convolution filtering of an arbitrary H×W grid of pixels is achieved by partitioning the grid into smaller sections, performing computation for each section, and combining the intermediate results for all sections to obtain a final result. In one design, a command to perform convolution filtering on a grid of pixels with a kernel of coefficients is received, e.g., from a graphics application. The grid is partitioned into multiple sections, where each section may be 2×2 or smaller. Multiple instructions are generated for the multiple sections, with each instruction performing convolution computation on at least one pixel in one section. Each instruction may include pixel position information and applicable kernel coefficients. Instructions to combine the intermediate results from the multiple instructions are also generated.

    摘要翻译: 描述使用图形处理器中通常可用的硬件执行卷积滤波的技术。 通过将网格划分为更小的部分,对每个部分进行计算,并组合所有部分的中间结果以获得最终结果,实现了任意H×W像素网格的卷积滤波。 在一种设计中,例如从图形应用程序接收用于对具有系数内核的像素网格进行卷积滤波的命令。 网格划分为多个部分,每个部分可能为2×2或更小。 为多个部分生成多个指令,每个指令在一个部分中的至少一个像素上执行卷积计算。 每个指令可以包括像素位置信息和可应用的内核系数。 还会生成组合来自多条指令的中间结果的指令。

    MULTI-BANK CACHE MEMORY
    93.
    发明申请
    MULTI-BANK CACHE MEMORY 有权
    多银行高速缓存存储器

    公开(公告)号:US20130205091A1

    公开(公告)日:2013-08-08

    申请号:US13364901

    申请日:2012-02-02

    申请人: Jian Liang Chun Yu

    发明人: Jian Liang Chun Yu

    IPC分类号: G06F12/08

    摘要: In general, this disclosure describes techniques for increasing the throughput of multi-bank cache memory systems accessible by multiple clients. Requests for data from a client may be stored in a pending buffer associated with the client for a first cache memory bank. For each of the requests for data, a determination may be made as to if the request is able to be fulfilled by a cache memory within the first cache memory bank regardless of a status of requests by the client for data at a second cache memory bank. Data requested from the cache memory by the client may be stored in a read data buffer associated with the client according to an order of receipt of the requests for data in the pending buffer.

    摘要翻译: 通常,本公开描述了用于增加多个客户机可访问的多存储体高速缓冲存储器系统的吞吐量的技术。 来自客户端的数据请求可以存储在与第一高速缓冲存储器组的客户端相关联的待处理缓冲器中。 对于每个数据请求,可以确定该请求是否能够由第一高速缓冲存储器组中的高速缓存存储器满足,而不管客户端针对第二高速缓冲存储器组中的数据的请求的状态 。 客户端从高速缓冲存储器请求的数据可以根据接收到待处理缓冲器中的数据的请求的顺序存储在与客户端相关联的读取数据缓冲器中。

    Fragment shader bypass in a graphics processing unit, and apparatus and method thereof
    94.
    发明授权
    Fragment shader bypass in a graphics processing unit, and apparatus and method thereof 有权
    图形处理单元中的片段着色器旁路,及其装置和方法

    公开(公告)号:US08325184B2

    公开(公告)日:2012-12-04

    申请号:US11855832

    申请日:2007-09-14

    CPC分类号: G06T15/005

    摘要: Configuration information is used to make a determination to bypass fragment shading by a shader unit of a graphics processing unit, the shader unit capable of performing both vertex shading and fragment shader. Based on the determination, the shader unit performs vertex shading and bypasses fragment shading. A processing element other than the shader unit, such as a pixel blender, can be used to perform some fragment shading. Power is managed to “turn off” power to unused components in a case that fragment shading is bypassed. For example, power can be turned off to a number of arithmetic logic units, the shader unit using the reduced number of arithmetic logic unit to perform vertex shading. At least one register bank of the shader unit can be used as a FIFO buffer storing pixel attribute data for use, with texture data, to fragment shading operations by another processing element.

    摘要翻译: 配置信息用于确定通过图形处理单元的着色器单元绕过片段着色,着色器单元能够执行顶点着色和片段着色。 基于确定,着色器单元执行顶点着色并绕过片段着色。 可以使用除着色器单元之外的处理元件,例如像素混合器,以执行某些片段着色。 在绕过片段着色的情况下,功率被设计为关闭未使用组件的电源。 例如,功率可以关闭到多个算术逻辑单元,着色器单元使用减少数量的算术逻辑单元来执行顶点着色。 着色器单元的至少一个寄存器组可以用作FIFO缓冲器,其存储与纹理数据一起使用的像素属性数据,以分割另一个处理元件的着色操作。

    Multi-media processor cache with cache line locking and unlocking
    95.
    发明授权
    Multi-media processor cache with cache line locking and unlocking 有权
    多媒体处理器缓存与缓存行锁定和解锁

    公开(公告)号:US08200917B2

    公开(公告)日:2012-06-12

    申请号:US11862063

    申请日:2007-09-26

    IPC分类号: G06F12/14

    摘要: The disclosure relates to techniques for locking and unlocking cache lines in a cache included within a multi-media processor that performs read-modify-write functions using batch read and write requests for data stored in either an external memory or an embedded memory. The techniques may comprise receiving a read request in a batch of read requests for data included in a section of a cache line and setting a lock bit associated with the section in response to the read request. When the lock bit is set, additional read requests in the batch of read requests are unable to access data in that section of the cache line. The lock bit may be unset in response to a write request in a batch of write requests to update the data previously read out from that section of the cache line.

    摘要翻译: 本公开涉及用于锁定和解锁包含在多媒体处理器内的高速缓存中的高速缓存线的技术,该多媒体处理器使用对存储在外部存储器或嵌入式存储器中的数据进行批量读取和写入请求来执行读取 - 修改 - 写入功能。 这些技术可以包括在包含在高速缓存行的一部分中的数据的一批读取请求中接收读取请求,并响应于读取请求设置与该部分相关联的锁定位。 当锁定位设置时,批量读取请求中的其他读取请求无法访问高速缓存行的该部分中的数据。 响应于一批写请求中的写请求,锁定位可能未设置,以更新先前从高速缓存行的该部分读出的数据。

    Graphics processing unit with extended vertex cache
    96.
    发明授权
    Graphics processing unit with extended vertex cache 有权
    具有扩展顶点缓存的图形处理单元

    公开(公告)号:US07952588B2

    公开(公告)日:2011-05-31

    申请号:US11499187

    申请日:2006-08-03

    IPC分类号: G06T1/20 G06T1/00 G09G5/36

    CPC分类号: G06T15/005

    摘要: Techniques are described for processing computerized images with a graphics processing unit (GPU) using an extended vertex cache. The techniques include creating an extended vertex cache coupled to a GPU pipeline to reduce an amount of data passing through the GPU pipeline. The GPU pipeline receives an image geometry for an image, and stores attributes for vertices within the image geometry in the extended vertex cache. The GPU pipeline only passes vertex coordinates that identify the vertices and vertex cache index values that indicate storage locations of the attributes for each of the vertices in the extended vertex cache to other processing stages along the GPU pipeline. The techniques described herein defer the setup of attribute gradients to just before attribute interpolation in the GPU pipeline. The vertex attributes may be retrieved from the extended vertex cache for attribute gradient setup just before attribute interpolation in the GPU pipeline.

    摘要翻译: 描述了使用扩展顶点高速缓存处理具有图形处理单元(GPU)的计算机化图像的技术。 这些技术包括创建一个连接到GPU流水线的扩展顶点缓存,以减少通过GPU流水线的数据量。 GPU流水线接收图像的图像几何,并在扩展顶点高速缓存中存储图像几何中的顶点的属性。 GPU流水线仅通过顶点坐标,其顶点和顶点高速缓存索引值指示扩展顶点高速缓存中每个顶点的属性的存储位置,沿着GPU流水线到其他处理阶段。 本文描述的技术将属性梯度的设置延迟到GPU管线中的属性插值之前。 可以从扩展顶点高速缓存中检索顶点属性,以便在GPU管线中的属性插值之前进行属性梯度设置。

    GRAPHICS PROCESSING UNIT WITH DEFERRED VERTEX SHADING
    97.
    发明申请
    GRAPHICS PROCESSING UNIT WITH DEFERRED VERTEX SHADING 有权
    图形处理单元,带有VERTEX SHADING

    公开(公告)号:US20100302246A1

    公开(公告)日:2010-12-02

    申请号:US12557427

    申请日:2009-09-10

    IPC分类号: G06T1/20 G06T15/60

    CPC分类号: G06T15/40 G06T1/20 G06T15/005

    摘要: Techniques are described for processing graphics images with a graphics processing unit (GPU) using deferred vertex shading. An example method includes the following: generating, within a processing pipeline of a graphics processing unit (GPU), vertex coordinates for vertices of each primitive within an image geometry, wherein the vertex coordinates comprise a location and a perspective parameter for each one of the vertices, and wherein the image geometry represents a graphics image; identifying, within the processing pipeline of the GPU, visible primitives within the image geometry based upon the vertex coordinates; and, responsive to identifying the visible primitives, generating, within the processing pipeline of the GPU, vertex attributes only for the vertices of the visible primitives in order to determine surface properties of the graphics image.

    摘要翻译: 描述了使用延迟顶点着色处理具有图形处理单元(GPU)的图形图像的技术。 示例性方法包括以下:在图形处理单元(GPU)的处理流水线内生成图像几何中每个图元的顶点的顶点坐标,其中顶点坐标包括位置和透视参数 顶点,并且其中图像几何表示图形图像; 在GPU的处理流水线内识别基于顶点坐标的图像几何图形内的可见原始图形; 并且响应于识别可见原语,在GPU的处理流水线内生成仅针对可见图元的顶点的顶点属性,以便确定图形图像的表面特性。

    Universal rasterization of graphic primitives
    98.
    发明授权
    Universal rasterization of graphic primitives 有权
    图形原语的通用光栅化

    公开(公告)号:US07791605B2

    公开(公告)日:2010-09-07

    申请号:US11742753

    申请日:2007-05-01

    IPC分类号: G06T11/20 G09G5/00

    摘要: A technique for universally rasterizing graphic primitives used in computer graphics is described. Configurations of the technique include determining three edges and a bounded region in a retrofitting bounding box. Each primitive has real and intrinsic edges. The process uses no more than three real edges of any one graphic primitive. In the case of a line, a third edge is set coincident with one of its two real edges. The area between the two real edges is enclosed by opposing perimeter edges of the bounding box. In the case of a rectangle, only three real edges are used. The fourth edge corresponds to a bounding edge provided by the retrofitting bounding box. In exemplary applications, the technique may be used in mobile video-enabled devices, such as cellular phones, video game consoles, PDAs, laptop computers, video-enabled MP3 players, and the like.

    摘要翻译: 描述了用于计算机图形中使用的用于普遍光栅化图形原语的技术。 该技术的配置包括确定三个边缘和改进边界框中的有界区域。 每个原语具有真实和内在的边缘。 该过程使用任何一个图形图元的不超过三个实际边缘。 在线的情况下,第三边缘被设置成与其两个实际边缘中的一个一致。 两个实际边缘之间的区域由边界框的相对的周边边缘包围。 在矩形的情况下,仅使用三个实际边。 第四边缘对应于由改装边界框提供的边界边缘。 在示例性应用中,该技术可以用于移动视频使能设备,例如蜂窝电话,视频游戏机,PDA,膝上型计算机,支持视频的MP3播放器等。

    Multi-stage floating-point accumulator
    99.
    发明授权
    Multi-stage floating-point accumulator 有权
    多级浮点累加器

    公开(公告)号:US07543013B2

    公开(公告)日:2009-06-02

    申请号:US11506349

    申请日:2006-08-18

    IPC分类号: G06F7/38

    摘要: A multi-stage floating-point accumulator includes at least two stages and is capable of operating at higher speed. In one design, the floating-point accumulator includes first and second stages. The first stage includes three operand alignment units, two multiplexers, and three latches. The three operand alignment units operate on a current floating-point value, a prior floating-point value, and a prior accumulated value. A first multiplexer provides zero or the prior floating-point value to the second operand alignment unit. A second multiplexer provides zero or the prior accumulated value to the third operand alignment unit. The three latches couple to the three operand alignment units. The second stage includes a 3-operand adder to sum the operands generated by the three operand alignment units, a latch, and a post alignment unit.

    摘要翻译: 多级浮点累加器包括至少两级,并且能够以更高的速度运行。 在一种设计中,浮点累加器包括第一级和第二级。 第一级包括三个操作对准单元,两个多路复用器和三个锁存器。 三个操作数对齐单元以当前浮点值,先前浮点值和先前累加值操作。 第一多路复用器为第二操作数对准单元提供零或先前的浮点值。 第二多路复用器为第三操作数对准单元提供零或先前的累加值。 三个锁存器耦合到三个操作数对齐单元。 第二级包括一个3操作数加法器,用于对由三个操作数对齐单元产生的操作数,一个锁存器和一个后置对准单元求和。

    MULTI-MEDIA PROCESSOR CACHE WITH CAHE LINE LOCKING AND UNLOCKING
    100.
    发明申请
    MULTI-MEDIA PROCESSOR CACHE WITH CAHE LINE LOCKING AND UNLOCKING 有权
    多媒体处理器与CAHE线锁定和解锁

    公开(公告)号:US20090083497A1

    公开(公告)日:2009-03-26

    申请号:US11862063

    申请日:2007-09-26

    IPC分类号: G06F12/12 G06F12/08

    摘要: The disclosure relates to techniques for locking and unlocking cache lines in a cache included within a multi-media processor that performs read-modify-write functions using batch read and write requests for data stored in either an external memory or an embedded memory. The techniques may comprise receiving a read request in a batch of read requests for data included in a section of a cache line and setting a lock bit associated with the section in response to the read request. When the lock bit is set, additional read requests in the batch of read requests are unable to access data in that section of the cache line. The lock bit may be unset in response to a write request in a batch of write requests to update the data previously read out from that section of the cache line.

    摘要翻译: 本公开涉及用于锁定和解锁包含在多媒体处理器内的高速缓存中的高速缓存线的技术,该多媒体处理器使用对存储在外部存储器或嵌入式存储器中的数据进行批量读取和写入请求来执行读取 - 修改 - 写入功能。 这些技术可以包括在包含在高速缓存行的一部分中的数据的一批读取请求中接收读取请求,并响应于读取请求设置与该部分相关联的锁定位。 当锁定位设置时,批量读取请求中的其他读取请求无法访问高速缓存行的该部分中的数据。 响应于一批写请求中的写请求,锁定位可能未设置,以更新先前从高速缓存行的该部分读出的数据。