GPU Compute Optimization Via Wavefront Reforming
    1.
    发明申请
    GPU Compute Optimization Via Wavefront Reforming 有权
    GPU通过波前重构计算优化

    公开(公告)号:US20130247067A1

    公开(公告)日:2013-09-19

    申请号:US13422430

    申请日:2012-03-16

    IPC分类号: G06F9/46

    摘要: Methods and systems are provided for graphics processing unit optimization via wavefront reforming including queuing one or more work-items of a wavefront into a plurality of queues of a compute unit. Each queue is associated with a particular processor within the compute unit. A plurality of work passes are performed. A determination is made which of the plurality of queues are below a threshold amount of work-items. Remaining one or more work-items from the queues with remaining ones of the work-items are redistributed to the below threshold queues. A subsequent work pass is performed. The, repeating of the determining, redistributing, and performing the subsequent work pass is done until all the queues are empty.

    摘要翻译: 提供了用于经由波前重整的图形处理单元优化的方法和系统,包括将波前的一个或多个工作项排队到计算单元的多个队列中。 每个队列与计算单元内的特定处理器相关联。 执行多个工作通行证。 确定多个队列中的哪个队列低于阈值量的工作项。 从队列中剩余的一个工作项保留一个或多个工作项将重新分配到下面的阈值队列。 执行后续工作通行证。 重复确定,重新分发和执行后续工作传递,直到所有队列都为空。

    GPU distributed work-item queuing
    2.
    发明授权
    GPU distributed work-item queuing 有权
    GPU分布式工作项排队

    公开(公告)号:US09009712B2

    公开(公告)日:2015-04-14

    申请号:US13422405

    申请日:2012-03-16

    IPC分类号: G06F9/46 G06F17/30

    摘要: Methods and systems are provided for graphics processing unit distributed work-item queuing. One or more work-items of a wavefront are queued into a first level queue of a compute unit. When one or more additional work-items exist, a queuing of the additional work-items into a second level queue of the compute unit is performed. The queuing of the work-items into the first and second level queue is performed based on an assignment technique.

    摘要翻译: 为图形处理单元提供分布式工作排队的方法和系统。 波前的一个或多个工作项排队到计算单元的第一级队列中。 当存在一个或多个附加工作项时,执行将附加工作项排队到计算单元的第二级队列中。 基于分配技术执行将工作排队到第一和第二级队列中。

    APPARATUS AND METHOD FOR DECODING USING COEFFICIENT COMPRESSION
    3.
    发明申请
    APPARATUS AND METHOD FOR DECODING USING COEFFICIENT COMPRESSION 审中-公开
    使用系数压缩解码的装置和方法

    公开(公告)号:US20130021350A1

    公开(公告)日:2013-01-24

    申请号:US13186007

    申请日:2011-07-19

    IPC分类号: G06T1/20

    CPC分类号: H04N19/436 H04N19/44

    摘要: Methods and apparatus for utilizing coefficient compression in graphics decoding are provided. In one example, a computer processing unit (CPU) is interfaced with a graphic processing unit (GPU) where the CPU extracts coefficients and passes compressed coefficient data, preferably in uniformly sized data packets, to the GPU for decoding and coefficient processing. Preferably the extracted coefficients are inverse transform (iT) coefficients and CPU includes an encoder control component configured to adaptively select a coefficient encoding process for performing the iT coefficient data compression based on the data content of the iT coefficients such that data packets are generated that include data that indentifies the selected coefficient encoding process used for encoding the compressed iT coefficient data contained in the data packet. In such case, the GPU is configured to receive such data packets and decode the iT coefficient data within each packet using a coefficient decoding method complementary to the selected coefficient encoding process identified within the packet. The GPU preferably uses massively parallel coefficient decoding of such data packets.

    摘要翻译: 提供了在图形解码中利用系数压缩的方法和装置。 在一个示例中,计算机处理单元(CPU)与图形处理单元(GPU)接口,其中CPU提取系数并将优选地以均匀大小的数据分组的压缩系数数据传送到GPU用于解码和系数处理。 优选地,所提取的系数是逆变换(iT)系数,并且CPU包括编码器控制组件,其被配置为基于iT系数的数据内容自适应地选择用于执行iT系数数据压缩的系数编码处理,使得生成包括 识别用于对包含在数据分组中的压缩iT系数数据进行编码的所选择的系数编码处理的数据。 在这种情况下,GPU被配置为接收这样的数据分组,并且使用与分组内识别的所选择的系数编码处理互补的系数解码方法对每个分组内的iT系数数据进行解码。 GPU优选地使用这种数据分组的大规模并行系数解码。

    APPARATUS AND METHOD FOR VIDEO PROCESSING
    4.
    发明申请
    APPARATUS AND METHOD FOR VIDEO PROCESSING 有权
    用于视频处理的装置和方法

    公开(公告)号:US20130034160A1

    公开(公告)日:2013-02-07

    申请号:US13196181

    申请日:2011-08-02

    IPC分类号: H04N7/32 H04N7/26

    CPC分类号: H04N19/43 H04N19/436

    摘要: Methods and apparatus for facilitating motion estimation in video processing are provided. Preferably, coordinates of a search area within a video frame are determined for each of a plurality of macroblocks (MBs) of a reference frame based upon a predicted location derived from the coordinates of the MB within the reference frame and motion estimation information. The video frame can be segmented into tiles and associated overlapping tile defined for at least some tiles. Search data is defined for each tile as pel data for each pixel within that tile and any associated tile. Macroblock searches are preferably conducted on a tile assignment basis with tile search assignments distributed among a plurality of processing elements. Each processing element preferably has a local memory it uses for the search data when performing a tile search assignment.

    摘要翻译: 提供了用于促进视频处理中的运动估计的方法和装置。 优选地,基于从参考帧中的MB的坐标和运动估计信息导出的预测位置,为参考帧的多个宏块(MB)中的每一个确定视频帧内的搜索区域的坐标。 视频帧可以被分割成用于至少一些瓦片定义的瓦片和相关联的重叠瓦片。 为每个瓦片定义搜索数据,作为该瓦片和任何相关瓦片内每个像素的像素数据。 宏块搜索优选地在瓦片分配的基础上进行,其中分块搜索分配在多个处理元件之间。 每个处理元件优选地具有在执行瓦片搜索分配时用于搜索数据的本地存储器。

    Apparatus and method for video processing
    5.
    发明授权
    Apparatus and method for video processing 有权
    视频处理装置和方法

    公开(公告)号:US09167260B2

    公开(公告)日:2015-10-20

    申请号:US13196181

    申请日:2011-08-02

    CPC分类号: H04N19/43 H04N19/436

    摘要: Methods and apparatus for facilitating motion estimation in video processing are provided. Preferably, coordinates of a search area within a video frame are determined for each of a plurality of macroblocks (MBs) of a reference frame based upon a predicted location derived from the coordinates of the MB within the reference frame and motion estimation information. The video frame can be segmented into tiles and associated overlapping tile defined for at least some tiles. Search data is defined for each tile as pel data for each pixel within that tile and any associated tile. Macroblock searches are preferably conducted on a tile assignment basis with tile search assignments distributed among a plurality of processing elements. Each processing element preferably has a local memory it uses for the search data when performing a tile search assignment.

    摘要翻译: 提供了用于促进视频处理中的运动估计的方法和装置。 优选地,基于从参考帧中的MB的坐标和运动估计信息导出的预测位置,为参考帧的多个宏块(MB)中的每一个确定视频帧内的搜索区域的坐标。 视频帧可以被分割成用于至少一些瓦片定义的瓦片和相关联的重叠瓦片。 为每个瓦片定义搜索数据,作为该瓦片和任何相关瓦片内每个像素的像素数据。 宏块搜索优选地在瓦片分配的基础上进行,其中分块搜索分配在多个处理元件之间。 每个处理元件优选地具有在执行瓦片搜索分配时用于搜索数据的本地存储器。

    GPU compute optimization via wavefront reforming
    6.
    发明授权
    GPU compute optimization via wavefront reforming 有权
    GPU通过波前重构计算优化

    公开(公告)号:US09135077B2

    公开(公告)日:2015-09-15

    申请号:US13422430

    申请日:2012-03-16

    摘要: Methods and systems are provided for graphics processing unit optimization via wavefront reforming including queuing one or more work-items of a wavefront into a plurality of queues of a compute unit. Each queue is associated with a particular processor within the compute unit. A plurality of work passes are performed. A determination is made which of the plurality of queues are below a threshold amount of work-items. Remaining one or more work-items from the queues with remaining ones of the work-items are redistributed to the below threshold queues. A subsequent work pass is performed. The, repeating of the determining, redistributing, and performing the subsequent work pass is done until all the queues are empty.

    摘要翻译: 提供了用于经由波前重整的图形处理单元优化的方法和系统,包括将波前的一个或多个工作项排队到计算单元的多个队列中。 每个队列与计算单元内的特定处理器相关联。 执行多个工作通行证。 确定多个队列中的哪个队列低于阈值量的工作项。 从队列中剩余的一个工作项保留一个或多个工作项将重新分配到下面的阈值队列。 执行后续工作通行证。 重复确定,重新分发和执行后续工作传递,直到所有队列都为空。

    GPU Distributed Work-Item Queuing
    7.
    发明申请
    GPU Distributed Work-Item Queuing 有权
    GPU分布式工作项排队

    公开(公告)号:US20130247054A1

    公开(公告)日:2013-09-19

    申请号:US13422405

    申请日:2012-03-16

    IPC分类号: G06F9/46

    摘要: Methods and systems are provided for graphics processing unit distributed work-item queuing. One or more work-items of a wavefront are queued into a first level queue of a compute unit. When one or more additional work-items exist, a queuing of the additional work-items into a second level queue of the compute unit is performed. The queuing of the work-items into the first and second level queue is performed based on an assignment technique.

    摘要翻译: 为图形处理单元提供分布式工作排队的方法和系统。 波前的一个或多个工作项排队到计算单元的第一级队列中。 当存在一个或多个附加工作项时,执行附加工作项到计算单元的第二级队列中的排队。 基于分配技术执行将工作排队到第一和第二级队列中。