摘要:
One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.
摘要:
One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.
摘要:
One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.
摘要:
A method for using a pipelined L2 cache to implement memory transfers for a video processor. The method includes accessing a queue of read requests from a video processor. For each of the read requests, a determination is made as to whether there is a cache line hit corresponding to the request. For each cache line miss, a cache line slot is allocated to store a new cache line responsive to the cache line miss. An in-order set of cache lines is output to the video processor responsive to the queue of read requests.
摘要:
A method for using a programmable DMA engine to implement memory transfers and video processing for a video processor. A DMA control program is configured for controlling DMA memory transfers between a frame buffer memory and a video processor. The DMA control program is stored in the DMA engine. A DMA request can be received from the video processor. The DMA control program is executable to implement the DMA request for the video processor. The DMA engine is operable to execute low-level command for accessing the frame buffer memory to implement a high-level command.
摘要:
A multidimensional datapath processing system for a video processor for executing video processing operations. The video processor includes a scalar execution unit configured to execute scalar video processing operations and a vector execution unit configured to execute vector video processing operations. A data store memory is included for storing data for the vector execution unit. The data store memory includes a plurality of tiles having symmetrical bank data structures arranged in an array. The bank data structures are configured to support accesses to different tiles of each bank.
摘要:
When switching between a DVD-video mode and a DVD-audio mode in a DVD-A/V player, a current video frame is stored in a current display buffer portion of the memory during the DVD-video mode. The DVD-A/V player is paused in the DVD-video mode and set in the DVD-audio mode. If it is determined that the current display buffer portion of the memory is a reserved display buffer portion of the memory, then the current video frame is copied to a reconstructed display buffer portion of the memory. At least the current display portion of the memory is designated as an ASV buffer and a frame buffer management scheme is changed so as to preserve the ASV buffer.
摘要:
Embodiments of the present invention provide a memory controller comprising a front-end module, a back-end module communicatively coupled to the front-end module, and a physical interface module communicatively coupled to the back-end module. The front-end module generates a plurality of page packets from a plurality of received memory commands, wherein the order of receipt of said memory commands is preserved. The back-end module dynamically issues a next one of the plurality of page packets while issuing a current one of the plurality of page packets. The physical interface module causes a plurality of transfers according to the dynamically issued current one and next one of the plurality of page packets.
摘要:
A method for context switching on a video processor having a scalar execution unit and a vector execution unit. The method includes executing a first task and a second task on a vector execution unit. The first task in the second task can be from different respective contexts. The first task and the second task are each allocated to the vector execution unit from a scalar execution unit. The first task and the second task each comprise a plurality of work packages. In response to a switch notification, a work package boundary of the first task is designated. A context switch from the first task to the second task is then executed on the work package boundary.
摘要:
A dynamic allocation of available ASV buffer memory space is performed on each pack in a DVD-A bitstream one pack at a time. Concurrently, an ASV buffer table is updated for each type of data pack currently being processed. The ASV buffer table includes pointers corresponding to the various fields that form a particular ASV frame. In this way, only that memory that is required to store a particular ASV frame is used thereby allowing the ASV buffer memory to be configured on the fly in such a manner as to efficiently store the required ASV frame data. When a particular ASV frame is to be displayed, or otherwise processed, the ASV buffer table is accessed, and the particular pointers for a specific ASV frame are looked up and used to access the desired ASV frame.