Method and apparatus for appending memory commands during a direct memory access operation
    1.
    发明授权
    Method and apparatus for appending memory commands during a direct memory access operation 有权
    在直接存储器访问操作期间附加存储器命令的方法和装置

    公开(公告)号:US06678755B1

    公开(公告)日:2004-01-13

    申请号:US09608544

    申请日:2000-06-30

    IPC分类号: G06F1314

    CPC分类号: G06F13/28

    摘要: A direct memory access (DMA) controller for controlling memory access operations in a memory. During a memory access operation, the DMA controller executes a chain of DMA commands stored in a memory and having a respective address. The DMA controller can enter a self-linking mode where additional DMA commands can be appended to the end of the command chain without terminating the memory access operation, regardless of whether the last DMA command of the command chain has been executed by the DMA controller. The self-linking mode is entered when a link-address provided by the last DMA command matches a code. The code to cause the DMA controller to enter the self-linking mode may be a link address which points to the last executed DMA command, or alternatively, a predetermined bit pattern. The DMA controller exits the self-linking command and continues the memory access operation upon detecting a new link address for a new DMA command that is to be appended to the command chain. The new link address may be detected by having the DMA controller periodically check the link address of the last executed DMA command.

    摘要翻译: 用于控制存储器中的存储器访问操作的直接存储器访问(DMA)控制器。 在存储器访问操作期间,DMA控制器执行存储在存储器中并具有相应地址的一系列DMA命令。 DMA控制器可以进入自链接模式,其中附加的DMA命令可以附加到命令链的末尾而不终止存储器访问操作,而不管命令链的最后一个DMA命令是否由DMA控制器执行。 当最后一个DMA命令提供的链接地址匹配一个代码时,输​​入自链接模式。 导致DMA控制器进入自链接模式的代码可以是指向最后执行的DMA命令的链接地址,或者备选地是预定位模式。 DMA控制器退出自链接命令,并在检测到要附加到命令链的新DMA命令的新链接地址时继续存储器访问操作。 可以通过使DMA控制器周期性地检查最后执行的DMA命令的链路地址来检测新的链路地址。

    Cache invalidation method and apparatus for a graphics processing system
    2.
    发明授权
    Cache invalidation method and apparatus for a graphics processing system 有权
    用于图形处理系统的缓存无效方法和装置

    公开(公告)号:US06937246B2

    公开(公告)日:2005-08-30

    申请号:US10775299

    申请日:2004-02-09

    IPC分类号: G06F12/08 G09G5/36

    CPC分类号: G06F12/0891 G06F12/0875

    摘要: A cache for a graphics system storing both an address tag and an identification number for each block of data stored in the data cache. An address and identification number of a requested block of data is provided to the cache, and is checked against all of the address and identification number entries present. A block of data is provided if both the address and the identification number of the requested data matches an entry in the cache. However, if the address of the requested data is not present, or if the address matches an entry but the associated identification number does not match, a cache miss occurs, and the requested graphics data must be retrieved from a system memory. The address and identification number are updated, and the requested data replaces the former graphics data in the data cache. As a result, a block of data stored in the cache having the same address as the requested data, but having data that is invalid, can be invalidated without invalidating the entire cache.

    摘要翻译: 用于图形系统的缓存,其存储存储在数据高速缓存中的每个数据块的地址标签和标识号。 将所请求的数据块的地址和标识号提供给高速缓存,并且针对存在的所有地址和标识号条目进行检查。 如果请求的数据的地址和标识号都与缓存中的条目匹配,则提供数据块。 然而,如果请求的数据的地址不存在,或者如果地址与条目匹配,但相关的标识号不匹配,则发生高速缓存未命中,并且必须从系统存储器检索所请求的图形数据。 更新地址和标识号,并且所请求的数据替换数据高速缓存中的原来的图形数据。 结果,存储在具有与所请求的数据相同的地址但具有无效的数据的高速缓存中的数据块可以无效而不使整个高速缓存无效。

    Cache invalidation method and apparatus for a graphics processing system

    公开(公告)号:US06734867B1

    公开(公告)日:2004-05-11

    申请号:US09607504

    申请日:2000-06-28

    IPC分类号: G09G536

    CPC分类号: G06F12/0891 G06F12/0875

    摘要: A cache for a graphics system storing both an address tag and an identification number for each block of data stored in the data cache. An address and identification number of a requested block of data is provided to the cache, and is checked against all of the address and identification number entries present. A block of data is provided if both the address and the identification number of the requested data matches an entry in the cache. However, if the address of the requested data is not present, or if the address matches an entry but the associated identification number does not match, a cache miss occurs, and the requested graphics data must be retrieved from a system memory. The address and identification number are updated, and the requested data replaces the former graphics data in the data cache. As a result, a block of data stored in the cache having the same address as the requested data, but having data that is invalid, can be invalidated without invalidating the entire cache.

    Parallel runtime execution on multiple processors

    公开(公告)号:US09304834B2

    公开(公告)日:2016-04-05

    申请号:US13615473

    申请日:2012-09-13

    摘要: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

    DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS
    5.
    发明申请
    DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS 审中-公开
    数据并行计算在多个处理器上

    公开(公告)号:US20130007774A1

    公开(公告)日:2013-01-03

    申请号:US13614975

    申请日:2012-09-13

    IPC分类号: G06F9/46 G06F15/76

    摘要: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

    摘要翻译: 描述分配一个或多个物理计算设备(诸如连接到运行用于执行应用的一个或多个线程的应用的主机处理单元的CPU或GPU)的方法和装置。 分配可以基于表示来自用于在一个或多个线程中执行可执行程序的应用程序的处理能力要求的数据。 计算设备标识符可以与所分配的物理计算设备相关联,以在一个或多个所分配的物理计算设备中同时调度和执行一个或多个线程中的可执行文件。

    Block-based image compression method and apparatus

    公开(公告)号:US20060215914A1

    公开(公告)日:2006-09-28

    申请号:US11090378

    申请日:2005-03-25

    IPC分类号: G06K9/36

    摘要: A block-based image compression method and encoder/decoder circuit compresses a plurality of pixels having corresponding original color values and luminance values in a block according to different modes of operation. The encoding circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels derived from the corresponding luminance levels to produce at least a block color offset value and a quantization value. According to mode zero, each of the pixels in the block is associated with one of the plurality of generated representative color values to generate error map values and a mode zero color error value. According to mode one, representative color values for each of at least three luminance levels are also generated to produce at least three representative color values, corresponding bitmap values and a mode one color error value. A mode based compressed data generator is capable of operating in mode zero and/or one and produces block color mode zero data when the mode zero color error value is less than the mode one color error value, otherwise block color mode one data.

    Programmable multiple texture combine circuit for a graphics processing system and method for use thereof
    7.
    发明授权
    Programmable multiple texture combine circuit for a graphics processing system and method for use thereof 失效
    用于图形处理系统的可编程多纹理组合电路及其使用方法

    公开(公告)号:US06784895B1

    公开(公告)日:2004-08-31

    申请号:US09690905

    申请日:2000-10-17

    申请人: Aaftab Munshi

    发明人: Aaftab Munshi

    IPC分类号: G06T1140

    CPC分类号: G06T15/503 G06T15/04

    摘要: The present invention is directed toward a texture combine circuit for generating fragment graphics data for a pixel in a graphics processing system. The texture combine circuit includes at least one texture combine unit and is coupled to receive graphics data, such as a plurality of texture graphics data, and perform user selected graphics combine operations on a set of input data selected from the plurality of texture graphics data to produce the fragment graphics data for the pixel. The texture combine circuit may include several texture combine units in a cascade connection, where each texture combine unit is coupled to receive the plurality of texture graphics data and the resultant output value of the previous texture combine units in the cascade.

    摘要翻译: 本发明涉及一种用于在图形处理系统中生成用于像素的片段图形数据的纹理组合电路。 纹理组合电路包括至少一个纹理组合单元,并且被耦合以接收诸如多个纹理图形数据的图形数据,并且对从多个纹理图形数据中选择的一组输入数据执行用户选择的图形组合操作, 产生像素的片段图形数据。 纹理组合电路可以包括级联连接中的几个纹理组合单元,其中每个纹理组合单元被耦合以接收多个纹理图形数据以及级联中的先前纹理组合单元的合成输出值。

    PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS

    公开(公告)号:US20130063451A1

    公开(公告)日:2013-03-14

    申请号:US13615473

    申请日:2012-09-13

    IPC分类号: G06F15/16

    摘要: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

    PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS
    9.
    发明申请
    PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS 审中-公开
    并行执行多个处理器

    公开(公告)号:US20130055272A1

    公开(公告)日:2013-02-28

    申请号:US13597119

    申请日:2012-08-28

    IPC分类号: G06F9/46

    摘要: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

    摘要翻译: 描述了在一个或多个物理计算设备(例如CPU或GPU)中同时调度用于在一个或多个物理计算设备中执行的调度队列中的多个可执行程序的方法和装置。 一个或多个可执行文件在来自具有用于不同于一个或多个物理计算设备的物理计算设备的类型的现有可执行程序的源的在线编译。 确定与调度的可执行程序相对应的元件之间的依赖性关系,以在多个物理计算设备中同时选择要被多个线程执行的可执行文件。 如果GPU忙于图形处理线程,则初始化用于在物理计算设备的GPU中执行可执行程序的线程被初始化以在物理计算设备的另一个CPU中执行。

    Shared stream memory on multiple processors
    10.
    发明授权
    Shared stream memory on multiple processors 有权
    多个处理器上的共享流内存

    公开(公告)号:US08108633B2

    公开(公告)日:2012-01-31

    申请号:US11800256

    申请日:2007-05-03

    IPC分类号: G06F12/00

    CPC分类号: G06F9/5016 G06F9/5044

    摘要: A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.

    摘要翻译: 描述了根据计算处理器是否支持存储能力来分配从主处理器向计算处理器加载的可执行文件中的变量的流存储器和/或本地存储器的方法和装置。 计算处理器可以是图形处理单元(GPU)或中央处理单元(CPU)。 或者,运行在主处理器中的应用程序配置计算处理器(例如CPU或GPU)中的存储能力,以确定用于访问由计算处理器中的多个线程执行的可执行文件中的变量的存储器位置。 配置和分配基于主机处理器中的API调用。