摘要:
Circuits, methods, and apparatus that provide highly integrated digital media processors for digital consumer electronics applications. These digital media processors are capable of performing the parallel processing of multiple format audio, video, and graphics signals. In one embodiment, audio and video signals may be received from a variety of input devices or appliances, such as antennas, VCRs, DVDs, and networked devices such as camcorders and modems, while output audio and video signals may be provided to output devices such as televisions, monitors, and networked devices such as printers and networked video recorders. Another embodiment of the present invention interfaces with a variety of devices such as navigation, entertainment, safety, memory, and networking devices. This embodiment can also be configured for use in a digital TV, set-top box, or home server. In this configuration, video and audio streams may be received from a number of cable, satellite, Internet, and consumer devices.
摘要:
An integrated circuit includes at least two different types of processors, such as a graphics processor and a video processor. At least one operation is commonly by supported by two different types of processors. For each commonly supported operation that is scheduled, a decision is made to determine which type of processor will be selected to implement the operation.
摘要:
Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
摘要:
Video filtering using a programmable graphics processor is described. The programmable graphics processor may be programmed to complete a plurality of video filtering operations in a single pass through a fragment-processing pipeline within the programmable graphics processor. Video filtering functions such as deinterlacing, chroma up-sampling, scaling, and deblocking may be performed by the fragment-processing pipeline. The fragment-processing pipeline may be programmed to perform motion adaptive deinterlacing, wherein a spatially variant filter determines, on a pixel basis, whether a “bob”, a “blend”, or a “weave” operation should be used to process an interlaced image.
摘要:
Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time. The thread ID controls various aspects of the thread's processing behavior such as the portion of the input data set to be processed by each thread, the portion of an output data set to be produced by each thread, and/or sharing of intermediate results among threads. Mechanisms for loading and launching CTAs in a representative processing core and for synchronizing threads within a CTA are also described.
摘要:
A system and method for transitions a computing system between operating modes that have different power consumption characteristics. When a system management unit (SMU) determines that the computing system is in a low activity state, the SMU transitions the central processing unit (CPU) into a low power operating mode after the CPU stores critical operating state of the CPU in a memory. The SMU then intercepts and processes interrupts intended for the CPU, modifying a copy of the critical operating state. This effectively extends the time during which the CPU stays in lower power mode. When the SMU determines that the computing system exits a low activity state, the copy of the critical operating state is stored in the memory and the SMU transitions the CPU into a high power operating mode using the modified critical operating state.
摘要:
A video processor for executing video processing operations. The video processor includes a host interface for implementing communication between the video processor and a host CPU. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A scalar execution unit is coupled to the host interface and the memory interface and is configured to execute scalar video processing operations. A vector execution unit is coupled to the host interface and the memory interface and is configured to execute vector video processing operations.
摘要:
A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
摘要:
A method for implementing multi context execution on a video processor having a scalar execution unit and a vector execution unit. The method includes allocating a first task to a vector execution unit and allocating a second task to the vector execution unit. The first task is from a first context in the second task is from a second context. The method further includes interleaving a plurality of work packages comprising the first task and the second task to generate a combined work package stream. The combined work package stream is subsequently executed on the vector execution unit.
摘要:
A stream based memory access system for a video processor for executing video processing operations. The video processor includes a scalar execution unit configured to execute scalar video processing operations and a vector execution unit configured to execute vector video processing operations. A frame buffer memory is included for storing data for the scalar execution unit and the vector execution unit. A memory interface is included for establishing communication between the scalar execution unit and the vector execution unit and the frame buffer memory. The frame buffer memory comprises a plurality of tiles. The memory interface implements a first sequential access of tiles and implements a second stream comprising a second sequential access of tiles for the vector execution unit or the scalar execution unit.