Graphics processors with parallel scheduling and execution of threads
    11.
    发明授权
    Graphics processors with parallel scheduling and execution of threads 有权
    具有并行调度和线程执行的图形处理器

    公开(公告)号:US08345053B2

    公开(公告)日:2013-01-01

    申请号:US11533880

    申请日:2006-09-21

    IPC分类号: G06F15/80 G06F15/00 G06T1/00

    CPC分类号: G06T15/005

    摘要: A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.

    摘要翻译: 描述了能够并行调度和执行多个线程的图形处理器以及用于实现并行调度和执行的技术。 图形处理器可以包括多个硬件单元和调度器。 硬件单元可并行操作,每个硬件单元支持相应的一组操作。 硬件单元可以包括ALU核,基本功能核心,逻辑核心,纹理采样器,负载控制单元,一些其他硬件单元或其组合。 调度器将多个线程的指令同时分配到硬件单元。 图形处理器还可以包括指令高速缓存以存储线程和寄存器组以存储数据的指令。 指令高速缓存和寄存器组可以由硬件单元共享。

    Indexes of graphics processing objects in graphics processing unit commands
    12.
    发明授权
    Indexes of graphics processing objects in graphics processing unit commands 有权
    图形处理单元命令中图形处理对象的索引

    公开(公告)号:US08022958B2

    公开(公告)日:2011-09-20

    申请号:US11696665

    申请日:2007-04-04

    IPC分类号: G06T15/00 G06T15/50 G09G5/36

    CPC分类号: G06T15/00

    摘要: This disclosure describes techniques of loading batch commands into a graphics processing unit (GPU). As described herein, a GPU driver for the GPU identifies one or more graphics processing objects to be used by the GPU in order to render a batch of graphics primitives. The GPU driver may insert indexes associated with the identified graphics processing objects into a batch command. The GPU driver may then issue the batch command to the GPU. The GPU may use the indexes in the batch command to retrieve the graphics processing objects from memory. After retrieving the graphics processing objects from memory, the GPU may use the graphics processing objects to render the batch of graphics primitives.

    摘要翻译: 本公开描述了将批处理命令加载到图形处理单元(GPU)中的技术。 如本文所述,用于GPU的GPU驱动器识别要由GPU使用的一个或多个图形处理对象,以便呈现一批图形基元。 GPU驱动程序可以将与所识别的图形处理对象相关联的索引插入到批处理命令中。 然后,GPU驱动程序可以向GPU发出批处理命令。 GPU可以使用批处理命令中的索引从内存中检索图形处理对象。 在从存储器检索图形处理对象之后,GPU可以使用图形处理对象来渲染批量的图形基元。

    Computer memory addressing mode employing memory segmenting and masking
    13.
    发明授权
    Computer memory addressing mode employing memory segmenting and masking 有权
    采用存储器分割和掩蔽的计算机存储器寻址模式

    公开(公告)号:US07921274B2

    公开(公告)日:2011-04-05

    申请号:US11737206

    申请日:2007-04-19

    IPC分类号: G06F12/00 G06F13/00

    摘要: A computer addressing mode and memory access method rely on a memory segment identifier and a memory segment mask for indicating memory locations. In this addressing mode, a processor receives an instruction comprising the memory segment identifier and memory segment mask. The processor employs a two-level address decoding scheme to access individual memory locations. Under this decoding scheme, the processor decodes the memory segment identifier to select a particular memory segment. Each memory segment includes a predefined number of memory locations. The processor selects memory locations within the memory segment based on mask bits set in the memory segment mask. The disclosed addressing mode is advantageous because it allows non-consecutive memory locations to be efficiently accessed.

    摘要翻译: 计算机寻址模式和存储器访问方法依赖于存储器段标识符和用于指示存储器位置的存储器段掩码。 在该寻址模式中,处理器接收包括存储器段标识符和存储器段掩码的指令。 处理器采用两级地址解码方案来访问各个存储单元。 在该解码方案下,处理器解码存储器段标识符以选择特定存储器段。 每个存储器段包括预定义数量的存储器位置。 处理器基于在存储器段掩码中设置的掩码位来选择存储器段内的存储器位置。 所公开的寻址模式是有利的,因为它允许有效地访问非连续存储器位置。

    On-demand multi-thread multimedia processor
    14.
    发明授权
    On-demand multi-thread multimedia processor 有权
    按需多线程多媒体处理器

    公开(公告)号:US07685409B2

    公开(公告)日:2010-03-23

    申请号:US11677362

    申请日:2007-02-21

    IPC分类号: G06F9/00

    摘要: A device includes a multimedia processor that can concurrently support multiple applications for various types of multimedia such as graphics, audio, video, camera, games, etc. The multimedia processor includes configurable storage resources to store instructions, data, and state information for the applications and assignable processing units to perform various types of processing for the applications. The configurable storage resources may include an instruction cache to store instructions for the applications, register banks to store data for the applications, context registers to store state information for threads of the applications, etc. The processing units may include an arithmetic logic unit (ALU) core, an elementary function core, a logic core, a texture sampler, a load control unit, a flow controller, etc. The multimedia processor allocates a configurable portion of the storage resources to each application and dynamically assigns the processing units to the applications as requested by these applications.

    摘要翻译: 一种设备包括多媒体处理器,其可以同时支持用于各种类型的多媒体(例如图形,音频,视频,照相机,游戏等)的多个应用。多媒体处理器包括可配置的存储资源以存储用于应用的指令,数据和状态信息 以及可分配处理单元来执行用于应用的各种类型的处理。 可配置的存储资源可以包括用于存储用于应用的指令的指令高速缓存,寄存器组存储用于应用的数据,上下文寄存器以存储用于应用的线程的状态信息等。处理单元可以包括算术逻辑单元(ALU )核心,基本功能核心,逻辑核心,纹理采样器,负载控制单元,流量控制器等。多媒体处理器将存储资源的可配置部分分配给每个应用,并且将处理单元动态地分配给应用 按照这些应用的要求。

    SYSTEM AND METHOD OF MAPPING SHADER VARIABLES INTO PHYSICAL REGISTERS
    15.
    发明申请
    SYSTEM AND METHOD OF MAPPING SHADER VARIABLES INTO PHYSICAL REGISTERS 有权
    将SHADER变量映射到物理寄存器的系统和方法

    公开(公告)号:US20090085919A1

    公开(公告)日:2009-04-02

    申请号:US11864484

    申请日:2007-09-28

    IPC分类号: G06F13/14 G09G5/36

    CPC分类号: G06T15/005 G06F8/441

    摘要: The present disclosure includes system and method of mapping shader variables into physical registers. In an embodiment, a graphics processing unit (GPU) and a memory coupled to the GPU are disclosed. The memory includes a processor readable data file that has a register file portion. The register file portion has a rectangular structure including a plurality of data items. At least two of the plurality of data items corresponding to data elements of a shader program. The data elements have different data storage types.

    摘要翻译: 本公开包括将着色器变量映射到物理寄存器的系统和方法。 在一个实施例中,公开了一种图形处理单元(GPU)和耦合到GPU的存储器。 存储器包括具有寄存器文件部分的处理器可读数据文件。 寄存器文件部分具有包括多个数据项的矩形结构。 与着色器程序的数据元素对应的多个数据项中的至少两个。 数据元素具有不同的数据存储类型。

    FRAGMENT SHADER BYPASS IN A GRAPHICS PROCESSING UNIT, AND APPARATUS AND METHOD THEREOF
    16.
    发明申请
    FRAGMENT SHADER BYPASS IN A GRAPHICS PROCESSING UNIT, AND APPARATUS AND METHOD THEREOF 有权
    图形处理单元中的片状阴影旁边,及其装置及方法

    公开(公告)号:US20090073168A1

    公开(公告)日:2009-03-19

    申请号:US11855832

    申请日:2007-09-14

    IPC分类号: G06T15/50

    CPC分类号: G06T15/005

    摘要: Configuration information is used to make a determination to bypass fragment shading by a shader unit of a graphics processing unit, the shader unit capable of performing both vertex shading and fragment shader. Based on the determination, the shader unit performs vertex shading and bypasses fragment shading. A processing element other than the shader unit, such as a pixel blender, can be used to perform some fragment shading. Power is managed to “turn off” power to unused components in a case that fragment shading is bypassed. For example, power can be turned off to a number of arithmetic logic units, the shader unit using the reduced number of arithmetic logic unit to perform vertex shading. At least one register bank of the shader unit can be used as a FIFO buffer storing pixel attribute data for use, with texture data, to fragment shading operations by another processing element.

    摘要翻译: 配置信息用于确定通过图形处理单元的着色器单元绕过片段着色,着色器单元能够执行顶点着色和片段着色。 基于确定,着色器单元执行顶点着色并绕过片段着色。 可以使用除着色器单元之外的处理元件,例如像素混合器,以执行某些片段着色。 在绕过片段着色的情况下,Power被设计为“关闭”未使用组件的电源。 例如,功率可以关闭到多个算术逻辑单元,着色器单元使用减少数量的算术逻辑单元来执行顶点着色。 着色器单元的至少一个寄存器组可以用作FIFO缓冲器,其存储与纹理数据一起使用的像素属性数据,以分割另一个处理元件的着色操作。

    3-D CLIPPING IN A GRAPHICS PROCESSING UNIT
    17.
    发明申请
    3-D CLIPPING IN A GRAPHICS PROCESSING UNIT 有权
    图形处理单元中的3-D剪辑

    公开(公告)号:US20080094412A1

    公开(公告)日:2008-04-24

    申请号:US11551900

    申请日:2006-10-23

    IPC分类号: G09G5/00

    摘要: A graphics processing unit (GPU) efficiently performs 3-dimensional (3-D) clipping using processing units used for other graphics functions. The GPU includes first and second hardware units and at least one buffer. The first hardware unit performs 3-D clipping of primitives using a first processing unit used for a first graphics function, e.g., an ALU used for triangle setup, depth gradient setup, etc. The first hardware unit may perform 3-D clipping by (a) computing clip codes for each vertex of each primitive, (b) determining whether to pass, discard or clip each primitive based on the clip codes for all vertices of the primitive, and (c) clipping each primitive to be clipped against clipping planes. The second hardware unit computes attribute component values for new vertices resulting from the 3-D clipping, e.g., using an ALU used for attribute gradient setup, attribute interpolation, etc. The buffer(s) store intermediate results of the 3-D clipping.

    摘要翻译: 图形处理单元(GPU)使用用于其他图形功能的处理单元有效地执行三维(3-D)剪辑。 GPU包括第一和第二硬件单元和至少一个缓冲器。 第一硬件单元使用用于第一图形功能的第一处理单元(例如用于三角形设置的ALU,深度梯度设置等)来对原语执行3-D限幅。第一硬件单元可以通过( a)计算每个图元的每个顶点的剪辑代码,(b)基于所述基元的所有顶点的剪辑代码来确定是否传递,丢弃或剪切每个图元,以及(c)剪切要针对剪切平面剪切的每个图元 。 第二硬件单元计算由3-D限幅产生的新顶点的属性分量值,例如使用用于属性梯度设置,属性插值等的ALU。该缓冲器存储3-D限幅的中间结果。

    DEPENDENT INSTRUCTION THREAD SCHEDULING
    18.
    发明申请
    DEPENDENT INSTRUCTION THREAD SCHEDULING 有权
    相关指令线程调度

    公开(公告)号:US20080059966A1

    公开(公告)日:2008-03-06

    申请号:US11468221

    申请日:2006-08-29

    IPC分类号: G06F9/46

    摘要: A thread scheduler includes context units for managing the execution of threads where each context unit includes a load reference counter for maintaining a counter value indicative of a difference between a number of data requests and a number of data returns associated with the particular context unit. A context controller of the thread context unit is configured to refrain from forwarding an instruction of a thread when the counter value is nonzero and the instruction includes a data dependency indicator indicating the instruction requires data returned by a previous instruction.

    摘要翻译: 线程调度器包括用于管理线程执行的上下文单元,其中每个上下文单元包括负载参考计数器,用于维持指示多个数据请求与与特定上下文单元相关联的数据返回数量之间的差异的计数器值。 线程上下文单元的上下文控制器被配置为当计数器值非零时避免转发线程的指令,并且该指令包括指示该指令需要先前指令返回的数据的数据依赖指示符。

    RELATIVE ADDRESS GENERATION
    19.
    发明申请
    RELATIVE ADDRESS GENERATION 有权
    相对地址生成

    公开(公告)号:US20080059756A1

    公开(公告)日:2008-03-06

    申请号:US11469347

    申请日:2006-08-31

    IPC分类号: G06F12/10

    摘要: Techniques to efficiently handle relative addressing are described. In one design, a processor includes an address generator and a storage unit. The address generator receives a relative address comprised of a base address and an offset, obtains a base value for the base address, sums the base value with the offset, and provides an absolute address corresponding to the relative address. The storage unit receives the base address and provides the base value to the address generator. The storage unit also receives the absolute address and provides data at this address. The address generator may derive the absolute address in a first clock cycle of a memory access. The storage unit may provide the data in a second clock cycle of the memory access. The storage unit may have multiple (e.g., two) read ports to support concurrent address generation and data retrieval.

    摘要翻译: 描述了有效处理相对寻址的技术。 在一种设计中,处理器包括地址发生器和存储单元。 地址生成器接收由基地址和偏移组成的相对地址,获得基地址的基值,将基本值与偏移量相加,并提供与相对地址对应的绝对地址。 存储单元接收基地址并将其提供给地址生成器。 存储单元还接收绝对地址,并在该地址处提供数据。 地址生成器可以在存储器访问的第一时钟周期中导出绝对地址。 存储单元可以在存储器访问的第二时钟周期中提供数据。 存储单元可以具有多个(例如两个)读端口,以支持并发地址生成和数据检索。

    Graphics processing unit with extended vertex cache
    20.
    发明申请
    Graphics processing unit with extended vertex cache 有权
    具有扩展顶点缓存的图形处理单元

    公开(公告)号:US20080030513A1

    公开(公告)日:2008-02-07

    申请号:US11499187

    申请日:2006-08-03

    IPC分类号: G06T1/60

    CPC分类号: G06T15/005

    摘要: Techniques are described for processing computerized images with a graphics processing unit (GPU) using an extended vertex cache. The techniques include creating an extended vertex cache coupled to a GPU pipeline to reduce an amount of data passing through the GPU pipeline. The GPU pipeline receives an image geometry for an image, and stores attributes for vertices within the image geometry in the extended vertex cache. The GPU pipeline only passes vertex coordinates that identify the vertices and vertex cache index values that indicate storage locations of the attributes for each of the vertices in the extended vertex cache to other processing stages along the GPU pipeline. The techniques described herein defer the setup of attribute gradients to just before attribute interpolation in the GPU pipeline. The vertex attributes may be retrieved from the extended vertex cache for attribute gradient setup just before attribute interpolation in the GPU pipeline.

    摘要翻译: 描述了使用扩展顶点高速缓存处理具有图形处理单元(GPU)的计算机化图像的技术。 这些技术包括创建一个连接到GPU流水线的扩展顶点缓存,以减少通过GPU流水线的数据量。 GPU流水线接收图像的图像几何,并在扩展顶点高速缓存中存储图像几何中的顶点的属性。 GPU流水线仅通过顶点坐标,其顶点和顶点高速缓存索引值指示扩展顶点高速缓存中每个顶点的属性的存储位置,沿着GPU流水线到其他处理阶段。 本文描述的技术将属性梯度的设置延迟到GPU管线中的属性插值之前。 可以从扩展顶点高速缓存中检索顶点属性,以便在GPU管线中的属性插值之前进行属性梯度设置。