Multi-threading in a video hardware engine

    公开(公告)号:US11736700B2

    公开(公告)日:2023-08-22

    申请号:US17543183

    申请日:2021-12-06

    CPC classification number: H04N19/174 H04N19/436

    Abstract: A video hardware engine with multi-threading functionality is disclosed. The video hardware engine includes a video hardware accelerator unit and a controller. The controller is coupled to the video hardware accelerator unit. The controller operates in an encode mode and a decode mode. In the encode mode, the controller receives a plurality of frames and encode attributes associated with each frame of the plurality of frames. The encode attributes associated with a frame of the plurality of frames is processed to generate encode parameters associated with the frame. The video hardware accelerator unit is configured to process the frame based on the encode parameters to generate an output. The output of the video hardware accelerator unit is processed to generate a compressed bit-stream and an encode status. In decode mode, the controller receives a compressed bit-stream and decode attributes and generates a plurality of frames and a decode status.

    Bandwidth Controlled Data Synchronization for Image and Vision Processor

    公开(公告)号:US20230259402A1

    公开(公告)日:2023-08-17

    申请号:US18302945

    申请日:2023-04-19

    CPC classification number: G06F9/5044 G06F9/4887 G06F11/0757 G06F9/52

    Abstract: Systems include data processors to process a set of image data in parallel, and thread schedulers coupled to the data processors. Each of the thread schedulers provides a respective task start signal for a respective data processor. Such systems also include a bandwidth controller coupled to one or more data processors. The bandwidth controller is configured to, for each of the data processor(s): maintain a respective token count, and determine whether to stall or propagate the respective task start signal from the respective thread scheduler to the data processor based on the respective token count. Other aspects include pattern adaptors respectively provided in the schedulers to allow mixing of multiple data patterns across blocks of data, transaction aggregators that allow re-using the same image data by multiple threads of execution while the image data remains in a given data buffer, and timers to detect failure and hang events.

    Fault detectable and tolerant neural network

    公开(公告)号:US11710030B2

    公开(公告)日:2023-07-25

    申请号:US16556733

    申请日:2019-08-30

    Abstract: A hardware neural network engine which uses checksums of the matrices used to perform the neural network computations. For fault correction, expected checksums are compared with checksums computed from the matrix developed from the matrix operation. The expected checksums are developed from the prior stage of the matrix operations or from the prior stage of the matrix operations combined with the input matrices to a matrix operation. This use of checksums allows reading of the matrices from memory, the dot product of the matrices and the accumulation of the matrices to be fault corrected without triplication of the matrix operation hardware and extensive use of error correcting codes. The nonlinear stage of the neural network computation is done using triplicated nonlinear computational logic. Fault detection is done in a similar manner, with fewer checksums needed and correction logic removed as compared to the fault correction operation.

    Delayed duplicate I-picture for video coding

    公开(公告)号:US11653031B2

    公开(公告)日:2023-05-16

    申请号:US17093695

    申请日:2020-11-10

    CPC classification number: H04N19/895 H04N19/107 H04N19/172 H04N19/39 H04N19/65

    Abstract: A method is provided that includes receiving pictures of a video sequence in a video encoder, and encoding the pictures to generate a compressed video bit stream that is transmitted to a video decoder in real-time, wherein encoding the pictures includes selecting a picture to be encoded as a delayed duplicate intra-predicted picture (DDI), wherein the picture would otherwise be encoded as an inter-predicted picture (P-picture), encoding the picture as an intra-predicted picture (I-picture) to generate the DDI, wherein the I-picture is reconstructed and stored for use as a reference picture for a decoder refresh picture, transmitting the DDI to the video decoder in non-real time, selecting a subsequent picture to be encoded as the decoder refresh picture, and encoding the subsequent picture in the compressed bit stream as the decoder refresh picture, wherein the subsequent P-picture is encoded as a P-picture predicted using the reference picture.

    Universal and adaptive de-mosaicing (CFA) system

    公开(公告)号:US11463664B2

    公开(公告)日:2022-10-04

    申请号:US16911579

    申请日:2020-06-25

    Abstract: A method of de-mosaicing pixel data from an image processor includes generating a pixel block that includes a plurality of image pixels. The method also includes determining a first image gradient between a first set of pixels of the pixel block and a second image gradient between a second set of pixels of the pixel block. The method also includes determining a first adaptive threshold value based on intensity of a third set of pixels of the pixel block. The pixels of the third set of pixels are adjacent to one another. The method also includes filtering the pixel block in a vertical, horizontal, or neutral direction based on the first and second image gradients and the first adaptive threshold value utilizing a plurality of FIR filters to generate a plurality of component images.

    Methods and systems for analyzing images in convolutional neural networks

    公开(公告)号:US11443505B2

    公开(公告)日:2022-09-13

    申请号:US16898972

    申请日:2020-06-11

    Abstract: A method for analyzing images to generate a plurality of output features includes receiving input features of the image and performing Fourier transforms on each input feature. Kernels having coefficients of a plurality of trained features are received and on-the-fly Fourier transforms (OTF-FTs) are performed on the coefficients in the kernels. The output of each Fourier transform and each OTF-FT are multiplied together to generate a plurality of products and each of the products are added to produce one sum for each output feature. Two-dimensional inverse Fourier transforms are performed on each sum.

Patent Agency Ranking