Low latency long short-term memory inference with sequence interleaving

    公开(公告)号:US11769041B2

    公开(公告)日:2023-09-26

    申请号:US16177218

    申请日:2018-10-31

    CPC classification number: G06N3/063 G06F7/5443 G06F17/16 G06N20/00

    Abstract: Systems, apparatuses, and methods for implementing a low latency long short-term memory (LSTM) machine learning engine using sequence interleaving techniques are disclosed. A computing system includes at least a host processing unit, a machine learning engine, and a memory. The host processing unit detects a plurality of sequences which will be processed by the machine learning engine. The host processing unit interleaves the sequences into data blocks and stores the data blocks in the memory. When the machine learning engine receives a given data block, the machine learning engine performs, in parallel, a plurality of matrix multiplication operations on the plurality of sequences in the given data block and a plurality of coefficients. Then, the outputs of the matrix multiplication operations are coupled to one or more LSTM layers.

    Method and apparatus for an HDR hardware processor inline to hardware encoder and decoder

    公开(公告)号:US11503310B2

    公开(公告)日:2022-11-15

    申请号:US16176826

    申请日:2018-10-31

    Abstract: A device includes an encoder, decoder, codec or combination thereof and inline hardware conversion units that are operative to convert stored image data into one of: an HDR/WCG format and an SDR/SCG format during the conversion process. Each of the inline hardware conversion units is operative to perform the conversion process independent of another read operation with the memory that stores the image data to be converted. In one example, an encoding unit is operative to perform a write operation with a memory to store the converted image data after completing the conversion process. In another example, a decoding unit is operative to perform a read operation with the memory to retrieve the image data from the memory before initiating the conversion process. In another example, an encoder/decoder unit is operative to perform at least one of: the read operation and the write operation.

    INTEGRATED VIDEO CODEC AND INFERENCE ENGINE
    5.
    发明申请

    公开(公告)号:US20190028752A1

    公开(公告)日:2019-01-24

    申请号:US15657613

    申请日:2017-07-24

    Abstract: Systems, apparatuses, and methods for integrating a video codec with an inference engine are disclosed. A system is configured to implement an inference engine and a video codec while sharing at least a portion of its processing elements between the inference engine and the video codec. By sharing processing elements when combining the inference engine and the video codec, the silicon area of the combination is reduced. In one embodiment, the portion of processing elements which are shared include a motion prediction/motion estimation/MACs engine with a plurality of multiplier-accumulator (MAC) units, an internal memory, and peripherals. The peripherals include a memory interface, a direct memory access (DMA) engine, and a microprocessor. The system is configured to perform a context switch to reprogram the processing elements to switch between operating modes. The context switch can occur at a frame boundary or at a sub-frame boundary.

    Exploiting Camera Depth Information for Video Encoding
    6.
    发明申请
    Exploiting Camera Depth Information for Video Encoding 审中-公开
    利用摄像机深度信息进行视频编码

    公开(公告)号:US20150092856A1

    公开(公告)日:2015-04-02

    申请号:US14043427

    申请日:2013-10-01

    Abstract: The present disclosure is directed a system and method for exploiting camera and depth information associated with rendered video frames, such as those rendered by a server operating as part of a cloud gaming service, to more efficiently encode the rendered video frames for transmission over a network. The method and system of the present disclosure can be used in a server operating in a cloud gaming service to improve, for example, the amount of latency, downstream bandwidth, and/or computational processing power associated with playing a video game over its service. The method and system of the present disclosure can be further used in other applications where camera and depth information of a rendered or captured video frame is available.

    Abstract translation: 本公开涉及一种用于利用与渲染的视频帧相关联的相机和深度信息的系统和方法,诸如由作为云游戏服务的一部分操作的服务器呈现的那些视频帧,以更有效地对所渲染的视频帧进行编码以在网络上传输 。 本公开的方法和系统可以用于在云游戏服务中操作的服务器中,以改善例如与通过其服务播放视频游戏相关的等待时间,下行带宽和/或计算处理能力的量。 本公开的方法和系统可以进一步用于其中可用于渲染或捕获的视频帧的相机和深度信息的其他应用中。

    MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS

    公开(公告)号:US20220129752A1

    公开(公告)日:2022-04-28

    申请号:US17571045

    申请日:2022-01-07

    Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

    MACHINE LEARNING INFERENCE ENGINE SCALABILITY

    公开(公告)号:US20190325305A1

    公开(公告)日:2019-10-24

    申请号:US16117302

    申请日:2018-08-30

    Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.

Patent Agency Ranking