摘要:
A direct memory access (DMA) controller for controlling memory access operations in a memory. During a memory access operation, the DMA controller executes a chain of DMA commands stored in a memory and having a respective address. The DMA controller can enter a self-linking mode where additional DMA commands can be appended to the end of the command chain without terminating the memory access operation, regardless of whether the last DMA command of the command chain has been executed by the DMA controller. The self-linking mode is entered when a link-address provided by the last DMA command matches a code. The code to cause the DMA controller to enter the self-linking mode may be a link address which points to the last executed DMA command, or alternatively, a predetermined bit pattern. The DMA controller exits the self-linking command and continues the memory access operation upon detecting a new link address for a new DMA command that is to be appended to the command chain. The new link address may be detected by having the DMA controller periodically check the link address of the last executed DMA command.
摘要:
A cache for a graphics system storing both an address tag and an identification number for each block of data stored in the data cache. An address and identification number of a requested block of data is provided to the cache, and is checked against all of the address and identification number entries present. A block of data is provided if both the address and the identification number of the requested data matches an entry in the cache. However, if the address of the requested data is not present, or if the address matches an entry but the associated identification number does not match, a cache miss occurs, and the requested graphics data must be retrieved from a system memory. The address and identification number are updated, and the requested data replaces the former graphics data in the data cache. As a result, a block of data stored in the cache having the same address as the requested data, but having data that is invalid, can be invalidated without invalidating the entire cache.
摘要:
A cache for a graphics system storing both an address tag and an identification number for each block of data stored in the data cache. An address and identification number of a requested block of data is provided to the cache, and is checked against all of the address and identification number entries present. A block of data is provided if both the address and the identification number of the requested data matches an entry in the cache. However, if the address of the requested data is not present, or if the address matches an entry but the associated identification number does not match, a cache miss occurs, and the requested graphics data must be retrieved from a system memory. The address and identification number are updated, and the requested data replaces the former graphics data in the data cache. As a result, a block of data stored in the cache having the same address as the requested data, but having data that is invalid, can be invalidated without invalidating the entire cache.
摘要:
A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
摘要:
A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.
摘要:
A block-based image compression method and encoder/decoder circuit compresses a plurality of pixels having corresponding original color values and luminance values in a block according to different modes of operation. The encoding circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels derived from the corresponding luminance levels to produce at least a block color offset value and a quantization value. According to mode zero, each of the pixels in the block is associated with one of the plurality of generated representative color values to generate error map values and a mode zero color error value. According to mode one, representative color values for each of at least three luminance levels are also generated to produce at least three representative color values, corresponding bitmap values and a mode one color error value. A mode based compressed data generator is capable of operating in mode zero and/or one and produces block color mode zero data when the mode zero color error value is less than the mode one color error value, otherwise block color mode one data.
摘要:
The present invention is directed toward a texture combine circuit for generating fragment graphics data for a pixel in a graphics processing system. The texture combine circuit includes at least one texture combine unit and is coupled to receive graphics data, such as a plurality of texture graphics data, and perform user selected graphics combine operations on a set of input data selected from the plurality of texture graphics data to produce the fragment graphics data for the pixel. The texture combine circuit may include several texture combine units in a cascade connection, where each texture combine unit is coupled to receive the plurality of texture graphics data and the resultant output value of the previous texture combine units in the cascade.
摘要:
A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
摘要:
A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
摘要:
A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.