-
公开(公告)号:US20070252843A1
公开(公告)日:2007-11-01
申请号:US11412678
申请日:2006-04-26
申请人: Chun Yu , Guofang Jiao , Yun Du
发明人: Chun Yu , Guofang Jiao , Yun Du
IPC分类号: G09G5/36
CPC分类号: G06T1/60 , G06T15/005
摘要: A graphics system includes a graphics processor and a cache memory system. The graphics processor includes processing units that perform various graphics operations to render graphics images. The cache memory system may include fully configurable caches, partially configurable caches, or a combination of configurable and dedicated caches. The cache memory system may further include a control unit, a crossbar, and an arbiter. The control unit may determine memory utilization by the processing units and assign the configurable caches to the processing units based on memory utilization. The configurable caches may be assigned to achieve good utilization of these caches and to avoid memory access bottleneck. The crossbar couples the processing units to their assigned caches. The arbiter facilitates data exchanges between the caches and a main memory.
摘要翻译: 图形系统包括图形处理器和高速缓冲存储器系统。 图形处理器包括执行各种图形操作以渲染图形图像的处理单元。 高速缓冲存储器系统可以包括完全可配置的高速缓存,部分可配置的高速缓存,或可配置和专用高速缓存的组合。 高速缓存存储器系统还可以包括控制单元,交叉开关和仲裁器。 控制单元可以确定处理单元的存储器利用率,并且基于存储器利用率将配置的高速缓存分配给处理单元。 可以分配可配置的高速缓存以实现这些高速缓存的良好利用并避免存储器访问瓶颈。 交叉开关将处理单元耦合到其分配的高速缓存。 仲裁器便于缓存和主存储器之间的数据交换。
-
12.
公开(公告)号:US08884972B2
公开(公告)日:2014-11-11
申请号:US11441696
申请日:2006-05-25
申请人: Yun Du , Guofang Jiao , Chun Yu , Alexei V. Bourd
发明人: Yun Du , Guofang Jiao , Chun Yu , Alexei V. Bourd
CPC分类号: G06T1/20 , G06F9/30167 , G06F9/383 , G06F9/3851 , G06F9/3885
摘要: A graphics processor capable of efficiently performing arithmetic operations and computing elementary functions is described. The graphics processor has at least one arithmetic logic unit (ALU) that can perform arithmetic operations and at least one elementary function unit that can compute elementary functions. The ALU(s) and elementary function unit(s) may be arranged such that they can operate in parallel to improve throughput. The graphics processor may also include fewer elementary function units than ALUs, e.g., four ALUs and a single elementary function unit. The four ALUs may perform an arithmetic operation on (1) four components of an attribute for one pixel or (2) one component of an attribute for four pixels. The single elementary function unit may operate on one component of one pixel at a time. The use of a single elementary function unit may reduce cost while still providing good performance.
摘要翻译: 描述能够有效执行算术运算和计算基本功能的图形处理器。 图形处理器具有至少一个可执行算术运算的算术逻辑单元(ALU)和至少一个可以计算基本功能的基本功能单元。 ALU和基本功能单元可以被布置成使得它们可以并行操作以提高吞吐量。 图形处理器还可以包括比ALU更少的基本功能单元,例如四个ALU和单个基本功能单元。 四个ALU可以对(1)四个像素的属性的四个分量或(2)四个像素的属性的一个分量执行算术运算。 单个基本功能单元可以一次操作一个像素的一个分量。 使用单个基本功能单元可以降低成本,同时仍然提供良好的性能。
-
公开(公告)号:US08436854B2
公开(公告)日:2013-05-07
申请号:US12557427
申请日:2009-09-10
申请人: Guofang Jiao , Yun Du , Lingjun Chen , Chun Yu
发明人: Guofang Jiao , Yun Du , Lingjun Chen , Chun Yu
IPC分类号: G06T15/40
CPC分类号: G06T15/40 , G06T1/20 , G06T15/005
摘要: Techniques are described for processing graphics images with a graphics processing unit (GPU) using deferred vertex shading. An example method includes the following: generating, within a processing pipeline of a graphics processing unit (GPU), vertex coordinates for vertices of each primitive within an image geometry, wherein the vertex coordinates comprise a location and a perspective parameter for each one of the vertices, and wherein the image geometry represents a graphics image; identifying, within the processing pipeline of the GPU, visible primitives within the image geometry based upon the vertex coordinates; and, responsive to identifying the visible primitives, generating, within the processing pipeline of the GPU, vertex attributes only for the vertices of the visible primitives in order to determine surface properties of the graphics image.
摘要翻译: 描述了使用延迟顶点着色处理具有图形处理单元(GPU)的图形图像的技术。 示例性方法包括以下:在图形处理单元(GPU)的处理流水线内生成图像几何中每个图元的顶点的顶点坐标,其中顶点坐标包括位置和透视参数 顶点,并且其中图像几何表示图形图像; 在GPU的处理流水线内识别基于顶点坐标的图像几何图形内的可见原始图形; 并且响应于识别可见原语,在GPU的处理流水线内生成仅针对可见图元的顶点的顶点属性,以便确定图形图像的表面特性。
-
公开(公告)号:US08212840B2
公开(公告)日:2012-07-03
申请号:US11551900
申请日:2006-10-23
申请人: Guofang Jiao , Chun Yu , Lingjun Chen , Yun Du
发明人: Guofang Jiao , Chun Yu , Lingjun Chen , Yun Du
IPC分类号: G09G5/00
摘要: A graphics processing unit (GPU) efficiently performs 3-dimensional (3-D) clipping using processing units used for other graphics functions. The GPU includes first and second hardware units and at least one buffer. The first hardware unit performs 3-D clipping of primitives using a first processing unit used for a first graphics function, e.g., an ALU used for triangle setup, depth gradient setup, etc. The first hardware unit may perform 3-D clipping by (a) computing clip codes for each vertex of each primitive, (b) determining whether to pass, discard or clip each primitive based on the clip codes for all vertices of the primitive, and (c) clipping each primitive to be clipped against clipping planes. The second hardware unit computes attribute component values for new vertices resulting from the 3-D clipping, e.g., using an ALU used for attribute gradient setup, attribute interpolation, etc. The buffer(s) store intermediate results of the 3-D clipping.
摘要翻译: 图形处理单元(GPU)使用用于其他图形功能的处理单元有效地执行三维(3-D)剪辑。 GPU包括第一和第二硬件单元和至少一个缓冲器。 第一硬件单元使用用于第一图形功能的第一处理单元(例如用于三角形设置的ALU,深度梯度设置等)来对原语执行3-D限幅。第一硬件单元可以通过( a)计算每个图元的每个顶点的剪辑代码,(b)基于所述基元的所有顶点的剪辑代码来确定是否传递,丢弃或剪切每个图元,以及(c)剪切要针对剪切平面剪切的每个图元 。 第二硬件单元计算由3-D限幅产生的新顶点的属性分量值,例如使用用于属性梯度设置,属性插值等的ALU。该缓冲器存储3-D限幅的中间结果。
-
公开(公告)号:US08203564B2
公开(公告)日:2012-06-19
申请号:US11675662
申请日:2007-02-16
申请人: Guofang Jiao , Angus M. Dorbie , Yun Du , Chun Yu , Jay C. Yun
发明人: Guofang Jiao , Angus M. Dorbie , Yun Du , Chun Yu , Jay C. Yun
CPC分类号: G06T15/005 , G06T11/40 , G09G5/363
摘要: Techniques for supporting both 2-D and 3-D graphics are described. A graphics processing unit (GPU) may perform 3-D graphics processing in accordance with a 3-D graphics pipeline to render 3-D images and may also perform 2-D graphics processing in accordance with a 2-D graphics pipeline to render 2-D images. Each stage of the 2-D graphics pipeline may be mapped to at least one stage of the 3-D graphics pipeline. For example, a clipping, masking and scissoring stage in 2-D graphics may be mapped to a depth test stage in 3-D graphics. Coverage values for pixels within paths in 2-D graphics may be determined using rasterization and depth test stages in 3-D graphics. A paint generation stage and an image interpolation stage in 2-D graphics may be mapped to a fragment shader stage in 3-D graphics. A blending stage in 2-D graphics may be mapped to a blending stage in 3-D graphics.
摘要翻译: 描述了支持2-D和3-D图形的技术。 图形处理单元(GPU)可以根据3-D图形流水线执行3D图形处理以渲染3-D图像,并且还可以根据2-D图形流水线执行2-D图形处理以呈现2 -D图像。 2-D图形管线的每个阶段可以映射到3-D图形流水线的至少一个阶段。 例如,2-D图形中的裁剪,掩蔽和裁剪阶段可以映射到3D图形中的深度测试阶段。 2-D图形中路径内像素的覆盖值可以使用3-D图形中的光栅化和深度测试阶段来确定。 2-D图形中的油漆生成阶段和图像插值阶段可以映射到3-D图形中的片段着色器阶段。 2-D图形中的混合阶段可以映射到3-D图形的混合阶段。
-
公开(公告)号:US08009172B2
公开(公告)日:2011-08-30
申请号:US11550344
申请日:2006-10-17
申请人: Guofang Jiao , Brian Ruttenberg , Chun Yu , Yun Du
发明人: Guofang Jiao , Brian Ruttenberg , Chun Yu , Yun Du
IPC分类号: G06T1/20
CPC分类号: G06T15/005
摘要: This disclosure describes a graphics processing unit (GPU) pipeline that uses one or more shared arithmetic logic units (ALUs). In order to facilitate such sharing of ALUs, the stages of the disclosed GPU pipeline may be rearranged relative to conventional GPU pipelines. In addition, by rearranging the stages of the GPU pipeline, efficiencies may be achieved in the image processing. Unlike conventional GPU pipelines, for example, an attribute gradient setup stage can be located much later in the pipeline, and the attribute interpolator stage may immediately follow the attribute gradient setup stage. This allows sharing of an ALU by the attribute gradient setup and attribute interpolator stages. Several other techniques and features for the GPU pipeline are also described, which may improve performance and possibly achieve additional processing efficiencies.
摘要翻译: 本公开描述了使用一个或多个共享算术逻辑单元(ALU)的图形处理单元(GPU)流水线。 为了促进ALU的这种共享,所公开的GPU流水线的阶段可以相对于传统的GPU管线重新排列。 此外,通过重新排列GPU流水线的各个阶段,可以在图像处理中实现效率。 与传统GPU流水线不同,例如,属性梯度建立阶段可以在流水线后面定位,属性内插器阶段可以立即跟随属性梯度建立阶段。 这允许通过属性渐变设置和属性内插器阶段共享ALU。 还描述了用于GPU流水线的若干其它技术和特征,这可以提高性能并可能实现额外的处理效率。
-
公开(公告)号:US20090323453A1
公开(公告)日:2009-12-31
申请号:US12163233
申请日:2008-06-27
CPC分类号: G11C5/025 , G11C7/1012 , G11C7/1051 , G11C7/1078 , G11C7/109 , G11C7/18 , G11C8/12 , Y10T29/49002
摘要: A memory includes multiple interface ports. The memory also includes at least two sub-arrays each having an instance of all of the bit lines of the memory and a portion of the word lines of the memory. The memory has a common decoder coupled to the sub-arrays and configured to control each of the word lines. The memory also includes multiplexers coupled to each of the interface ports. The multiplexers are configured to cause the selection of one of the sub-arrays based upon an address of a memory cell received at one or more of the interface ports.
摘要翻译: 存储器包括多个接口端口。 存储器还包括至少两个子阵列,每个子阵列具有存储器的所有位线的一个实例和存储器的字线的一部分。 存储器具有耦合到子阵列并被配置为控制每个字线的公共解码器。 存储器还包括耦合到每个接口端口的多路复用器。 多路复用器被配置为基于在一个或多个接口端口处接收的存储器单元的地址来引起子阵列之一的选择。
-
公开(公告)号:US20080094410A1
公开(公告)日:2008-04-24
申请号:US11550958
申请日:2006-10-19
申请人: Guofang Jiao , Chun Yu , Lingjun Chen , Yun Du
发明人: Guofang Jiao , Chun Yu , Lingjun Chen , Yun Du
IPC分类号: G09G5/02
CPC分类号: G06T15/503 , G06T2210/32
摘要: Techniques for implementing blending equations for various blending modes with a base set of operations are described. Each blending equation may be decomposed into a sequence of operations. In one design, a device includes a processing unit that implements a set of operations for multiple blending modes and a storage unit that stores operands and results. The processing unit receives a sequence of instructions for a sequence of operations for a blending mode selected from the plurality of blending modes and executes each instruction in the sequence to perform blending in accordance with the selected blending mode. The processing unit may include (a) an ALU that performs at least one operation in the base set, e.g., a dot product, (b) a pre-formatting unit that performs gamma correction and alpha scaling of inbound color values, and (c) a post-formatting unit that performs gamma compression and alpha scaling of outbound color values.
摘要翻译: 描述了用于具有基本操作集合的用于各种混合模式的混合方程的技术。 每个混合方程可以分解为一系列操作。 在一种设计中,设备包括一个处理单元,该处理单元实现多种混合模式的一组操作,以及存储操作数和结果的存储单元。 处理单元接收用于从多个混合模式中选择的混合模式的操作序列的指令序列,并且执行该顺序中的每个指令以根据所选择的混合模式执行混合。 处理单元可以包括(a)执行基本集合中的至少一个操作的ALU,例如点积,(b)执行伽马校正和入站颜色值的α缩放的预格式化单元,以及(c )一个后格式化单元,用于执行出色色彩值的伽玛压缩和alpha缩放。
-
19.
公开(公告)号:US20080074433A1
公开(公告)日:2008-03-27
申请号:US11533880
申请日:2006-09-21
申请人: Guofang Jiao , Yun Du , Chun Yu
发明人: Guofang Jiao , Yun Du , Chun Yu
IPC分类号: G06T1/00
CPC分类号: G06T15/005
摘要: A graphics processor capable of parallel scheduling and execution of multiple threads, and techniques for achieving parallel scheduling and execution, are described. The graphics processor may include multiple hardware units and a scheduler. The hardware units are operable in parallel, with each hardware unit supporting a respective set of operations. The hardware units may include an ALU core, an elementary function core, a logic core, a texture sampler, a load control unit, some other hardware unit, or a combination thereof. The scheduler dispatches instructions for multiple threads to the hardware units concurrently. The graphics processor may further include an instruction cache to store instructions for threads and register banks to store data. The instruction cache and register banks may be shared by the hardware units.
摘要翻译: 描述了能够并行调度和执行多个线程的图形处理器以及用于实现并行调度和执行的技术。 图形处理器可以包括多个硬件单元和调度器。 硬件单元可并行操作,每个硬件单元支持相应的一组操作。 硬件单元可以包括ALU核,基本功能核心,逻辑核心,纹理采样器,负载控制单元,一些其他硬件单元或其组合。 调度器将多个线程的指令同时分配到硬件单元。 图形处理器还可以包括指令高速缓存以存储线程和寄存器组以存储数据的指令。 指令高速缓存和寄存器组可以由硬件单元共享。
-
公开(公告)号:US20080046495A1
公开(公告)日:2008-02-21
申请号:US11506349
申请日:2006-08-18
申请人: Yun Du , Chun Yu , Guofang Jiao
发明人: Yun Du , Chun Yu , Guofang Jiao
IPC分类号: G06F7/38
CPC分类号: G06F7/5095 , G06F5/012 , G06F7/485 , G06F7/49936 , G06F2207/3884
摘要: A multi-stage floating-point accumulator includes at least two stages and is capable of operating at higher speed. In one design, the floating-point accumulator includes first and second stages. The first stage includes three operand alignment units, two multiplexers, and three latches. The three operand alignment units operate on a current floating-point value, a prior floating-point value, and a prior accumulated value. A first multiplexer provides zero or the prior floating-point value to the second operand alignment unit. A second multiplexer provides zero or the prior accumulated value to the third operand alignment unit. The three latches couple to the three operand alignment units. The second stage includes a 3-operand adder to sum the operands generated by the three operand alignment units, a latch, and a post alignment unit.
摘要翻译: 多级浮点累加器包括至少两级,并且能够以更高的速度运行。 在一种设计中,浮点累加器包括第一级和第二级。 第一级包括三个操作对准单元,两个多路复用器和三个锁存器。 三个操作数对齐单元以当前浮点值,先前浮点值和先前累加值操作。 第一多路复用器为第二操作数对准单元提供零或先前的浮点值。 第二多路复用器为第三操作数对准单元提供零或先前的累加值。 三个锁存器耦合到三个操作数对齐单元。 第二级包括一个3操作数加法器,用于对由三个操作数对齐单元产生的操作数,一个锁存器和一个后置对准单元求和。
-
-
-
-
-
-
-
-
-