Graphics processing unit with shared arithmetic logic unit
    1.
    发明授权
    Graphics processing unit with shared arithmetic logic unit 有权
    具有共享算术逻辑单元的图形处理单元

    公开(公告)号:US08009172B2

    公开(公告)日:2011-08-30

    申请号:US11550344

    申请日:2006-10-17

    IPC分类号: G06T1/20

    CPC分类号: G06T15/005

    摘要: This disclosure describes a graphics processing unit (GPU) pipeline that uses one or more shared arithmetic logic units (ALUs). In order to facilitate such sharing of ALUs, the stages of the disclosed GPU pipeline may be rearranged relative to conventional GPU pipelines. In addition, by rearranging the stages of the GPU pipeline, efficiencies may be achieved in the image processing. Unlike conventional GPU pipelines, for example, an attribute gradient setup stage can be located much later in the pipeline, and the attribute interpolator stage may immediately follow the attribute gradient setup stage. This allows sharing of an ALU by the attribute gradient setup and attribute interpolator stages. Several other techniques and features for the GPU pipeline are also described, which may improve performance and possibly achieve additional processing efficiencies.

    摘要翻译: 本公开描述了使用一个或多个共享算术逻辑单元(ALU)的图形处理单元(GPU)流水线。 为了促进ALU的这种共享,所公开的GPU流水线的阶段可以相对于传统的GPU管线重新排列。 此外,通过重新排列GPU流水线的各个阶段,可以在图像处理中实现效率。 与传统GPU流水线不同,例如,属性梯度建立阶段可以在流水线后面定位,属性内插器阶段可以立即跟随属性梯度建立阶段。 这允许通过属性渐变设置和属性内插器阶段共享ALU。 还描述了用于GPU流水线的若干其它技术和特征,这可以提高性能并可能实现额外的处理效率。

    GRAPHICS PROCESSING UNIT WITH SHARED ARITHMETIC LOGIC UNIT
    2.
    发明申请
    GRAPHICS PROCESSING UNIT WITH SHARED ARITHMETIC LOGIC UNIT 有权
    具有共享算术逻辑单元的图形处理单元

    公开(公告)号:US20080030512A1

    公开(公告)日:2008-02-07

    申请号:US11550344

    申请日:2006-10-17

    IPC分类号: G06T1/20

    CPC分类号: G06T15/005

    摘要: This disclosure describes a graphics processing unit (GPU) pipeline that uses one or more shared arithmetic logic units (ALUs). In order to facilitate such sharing of ALUs, the stages of the disclosed GPU pipeline may be rearranged relative to conventional GPU pipelines. In addition, by rearranging the stages of the GPU pipeline, efficiencies may be achieved in the image processing. Unlike conventional GPU pipelines, for example, an attribute gradient setup stage can be located much later in the pipeline, and the attribute interpolator stage may immediately follow the attribute gradient setup stage. This allows sharing of an ALU by the attribute gradient setup and attribute interpolator stages. Several other techniques and features for the GPU pipeline are also described, which may improve performance and possibly achieve additional processing efficiencies.

    摘要翻译: 本公开描述了使用一个或多个共享算术逻辑单元(ALU)的图形处理单元(GPU)流水线。 为了促进ALU的这种共享,所公开的GPU流水线的阶段可以相对于常规GPU流水线重新排列。 此外,通过重新排列GPU流水线的各个阶段,可以在图像处理中实现效率。 与传统GPU流水线不同,例如,属性梯度建立阶段可以在流水线后面定位,并且属性内插器阶段可以立即跟随属性梯度建立阶段。 这允许通过属性渐变设置和属性内插器阶段共享ALU。 还描述了用于GPU流水线的若干其它技术和特征,这可以提高性能并可能实现额外的处理效率。

    Universal rasterization of graphic primitives
    4.
    发明授权
    Universal rasterization of graphic primitives 有权
    图形原语的通用光栅化

    公开(公告)号:US07791605B2

    公开(公告)日:2010-09-07

    申请号:US11742753

    申请日:2007-05-01

    IPC分类号: G06T11/20 G09G5/00

    摘要: A technique for universally rasterizing graphic primitives used in computer graphics is described. Configurations of the technique include determining three edges and a bounded region in a retrofitting bounding box. Each primitive has real and intrinsic edges. The process uses no more than three real edges of any one graphic primitive. In the case of a line, a third edge is set coincident with one of its two real edges. The area between the two real edges is enclosed by opposing perimeter edges of the bounding box. In the case of a rectangle, only three real edges are used. The fourth edge corresponds to a bounding edge provided by the retrofitting bounding box. In exemplary applications, the technique may be used in mobile video-enabled devices, such as cellular phones, video game consoles, PDAs, laptop computers, video-enabled MP3 players, and the like.

    摘要翻译: 描述了用于计算机图形中使用的用于普遍光栅化图形原语的技术。 该技术的配置包括确定三个边缘和改进边界框中的有界区域。 每个原语具有真实和内在的边缘。 该过程使用任何一个图形图元的不超过三个实际边缘。 在线的情况下,第三边缘被设置成与其两个实际边缘中的一个一致。 两个实际边缘之间的区域由边界框的相对的周边边缘包围。 在矩形的情况下,仅使用三个实际边。 第四边缘对应于由改装边界框提供的边界边缘。 在示例性应用中,该技术可以用于移动视频使能设备,例如蜂窝电话,视频游戏机,PDA,膝上型计算机,支持视频的MP3播放器等。

    UNIVERSAL RASTERIZATION OF GRAPHIC PRIMITIVES
    5.
    发明申请
    UNIVERSAL RASTERIZATION OF GRAPHIC PRIMITIVES 有权
    图形主义的通用移植

    公开(公告)号:US20080273028A1

    公开(公告)日:2008-11-06

    申请号:US11742753

    申请日:2007-05-01

    IPC分类号: G06T17/10

    摘要: A technique for universally rasterizing graphic primitives used in computer graphics is described. Configurations of the technique include determining three edges and a bounded region in a retrofitting bounding box. Each primitive has real and intrinsic edges. The process uses no more than three real edges of any one graphic primitive. In the case of a line, a third edge is set coincident with one of its two real edges. The area between the two real edges is enclosed by opposing perimeter edges of the bounding box. In the case of a rectangle, only three real edges are used. The fourth edge corresponds to a bounding edge provided by the retrofitting bounding box. In exemplary applications, the technique may be used in mobile video-enabled devices, such as cellular phones, video game consoles, PDAs, laptop computers, video-enabled MP3 players, and the like.

    摘要翻译: 描述了用于计算机图形中使用的用于普遍光栅化图形原语的技术。 该技术的配置包括确定三个边缘和改进边界框中的有界区域。 每个原语具有真实和内在的边缘。 该过程使用任何一个图形图元的不超过三个实际边缘。 在线的情况下,第三边缘被设置成与其两个实际边缘中的一个一致。 两个实际边缘之间的区域由边界框的相对的周边边缘包围。 在矩形的情况下,仅使用三个实际边。 第四边缘对应于由改装边界框提供的边界边缘。 在示例性应用中,该技术可以用于移动视频使能设备,例如蜂窝电话,视频游戏机,PDA,膝上型计算机,支持视频的MP3播放器等。

    Multi-threaded processor with deferred thread output control
    6.
    发明授权
    Multi-threaded processor with deferred thread output control 有权
    具有延迟线程输出控制的多线程处理器

    公开(公告)号:US08869147B2

    公开(公告)日:2014-10-21

    申请号:US11445100

    申请日:2006-05-31

    摘要: A multi-threaded processor is provided that internally reorders output threads thereby avoiding the need for an external output reorder buffer. The multi-threaded processor writes its thread results back to an internal memory buffer to guarantee that thread results are outputted in the same order in which the threads are received. A thread scheduler within the multi-threaded processor manages thread ordering control to avoid the need for an external reorder buffer. A compiler for the multi-threaded processor converts instructions that would normally send processed results directly to an external reorder buffer so that the processed thread results are instead sent to the internal memory buffer of the multi-threaded processor.

    摘要翻译: 提供一种多线程处理器,其内部重新排序输出线程,从而避免需要外部输出重排序缓冲器。 多线程处理器将其线程结果写回内部存储器缓冲区,以保证以与接收线程相同的顺序输出线程结果。 多线程处理器内的线程调度器管理线程排序控制,以避免需要外部重排序缓冲区。 用于多线程处理器的编译器将通常将处理结果直接发送到外部重排序缓冲器的指令转换成经处理的线程结果而不是发送到多线程处理器的内部存储器缓冲区。

    Unified virtual addressed register file
    7.
    发明授权
    Unified virtual addressed register file 有权
    统一的虚拟寻址寄存器文件

    公开(公告)号:US08766996B2

    公开(公告)日:2014-07-01

    申请号:US11472701

    申请日:2006-06-21

    IPC分类号: G09G5/36

    摘要: A multi-threaded processor is provided, such as a shader processor, having an internal unified memory space that is shared by a plurality of threads and is dynamically assigned to threads as needed. A mapping table that maps virtual registers to available internal addresses in the unified memory space so that thread registers can be stored in contiguous or non-contiguous memory addresses. Dynamic sizing of the virtual registers allows flexible allocation of the unified memory space depending on the type and size of data in a thread register. Yet another feature provides an efficient method for storing graphics data in the unified memory space to improve fetch and store operations from the memory space. In particular, pixel data for four pixels in a thread are stored across four memory devices having independent input/output ports that permit the four pixels to be read in a single clock cycle for processing.

    摘要翻译: 提供了多线程处理器,例如着色器处理器,具有由多个线程共享的内部统一存储器空间,并且根据需要动态分配给线程。 映射表将虚拟寄存器映射到统一存储空间中的可用内部地址,以便线程寄存器可以存储在连续或不连续的存储器地址中。 虚拟寄存器的动态大小允许根据线程寄存器中数据的类型和大小灵活分配统一存储空间。 另一个特征提供了用于将统计存储器空间中的图形数据存储以改善从存储器空间获取和存储操作的有效方法。 特别地,线程中的四个像素的像素数据被存储在具有独立输入/输出端口的四个存储器件中,这些存储器件允许以单个时钟周期读取四个像素进行处理。

    Graphics processors with parallel scheduling and execution of threads
    8.
    发明授权
    Graphics processors with parallel scheduling and execution of threads 有权
    具有并行调度和线程执行的图形处理器

    公开(公告)号:US08345053B2

    公开(公告)日:2013-01-01

    申请号:US11533880

    申请日:2006-09-21

    IPC分类号: G06F15/80 G06F15/00 G06T1/00

    CPC分类号: G06T15/005

    摘要: A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.

    摘要翻译: 描述了能够并行调度和执行多个线程的图形处理器以及用于实现并行调度和执行的技术。 图形处理器可以包括多个硬件单元和调度器。 硬件单元可并行操作,每个硬件单元支持相应的一组操作。 硬件单元可以包括ALU核,基本功能核心,逻辑核心,纹理采样器,负载控制单元,一些其他硬件单元或其组合。 调度器将多个线程的指令同时分配到硬件单元。 图形处理器还可以包括指令高速缓存以存储线程和寄存器组以存储数据的指令。 指令高速缓存和寄存器组可以由硬件单元共享。

    On-demand multi-thread multimedia processor
    9.
    发明授权
    On-demand multi-thread multimedia processor 有权
    按需多线程多媒体处理器

    公开(公告)号:US07685409B2

    公开(公告)日:2010-03-23

    申请号:US11677362

    申请日:2007-02-21

    IPC分类号: G06F9/00

    摘要: A device includes a multimedia processor that can concurrently support multiple applications for various types of multimedia such as graphics, audio, video, camera, games, etc. The multimedia processor includes configurable storage resources to store instructions, data, and state information for the applications and assignable processing units to perform various types of processing for the applications. The configurable storage resources may include an instruction cache to store instructions for the applications, register banks to store data for the applications, context registers to store state information for threads of the applications, etc. The processing units may include an arithmetic logic unit (ALU) core, an elementary function core, a logic core, a texture sampler, a load control unit, a flow controller, etc. The multimedia processor allocates a configurable portion of the storage resources to each application and dynamically assigns the processing units to the applications as requested by these applications.

    摘要翻译: 一种设备包括多媒体处理器,其可以同时支持用于各种类型的多媒体(例如图形,音频,视频,照相机,游戏等)的多个应用。多媒体处理器包括可配置的存储资源以存储用于应用的指令,数据和状态信息 以及可分配处理单元来执行用于应用的各种类型的处理。 可配置的存储资源可以包括用于存储用于应用的指令的指令高速缓存,寄存器组存储用于应用的数据,上下文寄存器以存储用于应用的线程的状态信息等。处理单元可以包括算术逻辑单元(ALU )核心,基本功能核心,逻辑核心,纹理采样器,负载控制单元,流量控制器等。多媒体处理器将存储资源的可配置部分分配给每个应用,并且将处理单元动态地分配给应用 按照这些应用的要求。

    FRAGMENT SHADER BYPASS IN A GRAPHICS PROCESSING UNIT, AND APPARATUS AND METHOD THEREOF
    10.
    发明申请
    FRAGMENT SHADER BYPASS IN A GRAPHICS PROCESSING UNIT, AND APPARATUS AND METHOD THEREOF 有权
    图形处理单元中的片状阴影旁边,及其装置及方法

    公开(公告)号:US20090073168A1

    公开(公告)日:2009-03-19

    申请号:US11855832

    申请日:2007-09-14

    IPC分类号: G06T15/50

    CPC分类号: G06T15/005

    摘要: Configuration information is used to make a determination to bypass fragment shading by a shader unit of a graphics processing unit, the shader unit capable of performing both vertex shading and fragment shader. Based on the determination, the shader unit performs vertex shading and bypasses fragment shading. A processing element other than the shader unit, such as a pixel blender, can be used to perform some fragment shading. Power is managed to “turn off” power to unused components in a case that fragment shading is bypassed. For example, power can be turned off to a number of arithmetic logic units, the shader unit using the reduced number of arithmetic logic unit to perform vertex shading. At least one register bank of the shader unit can be used as a FIFO buffer storing pixel attribute data for use, with texture data, to fragment shading operations by another processing element.

    摘要翻译: 配置信息用于确定通过图形处理单元的着色器单元绕过片段着色,着色器单元能够执行顶点着色和片段着色。 基于确定,着色器单元执行顶点着色并绕过片段着色。 可以使用除着色器单元之外的处理元件,例如像素混合器,以执行某些片段着色。 在绕过片段着色的情况下,Power被设计为“关闭”未使用组件的电源。 例如,功率可以关闭到多个算术逻辑单元,着色器单元使用减少数量的算术逻辑单元来执行顶点着色。 着色器单元的至少一个寄存器组可以用作FIFO缓冲器,其存储与纹理数据一起使用的像素属性数据,以分割另一个处理元件的着色操作。