VIDEO CAPTIONING GENERATION SYSTEM AND METHOD

    公开(公告)号:US20240380949A1

    公开(公告)日:2024-11-14

    申请号:US18314019

    申请日:2023-05-08

    Applicant: Lemon Inc.

    Abstract: A system and a method are provided that include a processor executing a caption generation program to receive an input video, sample video frames from the input video, extract video frames from the input video, extract video embeddings and audio embeddings from the video frames, including local video tokens and local audio tokens, respectively, input the local video tokens and the local audio tokens into at least a transformer layer of a cross-modal encoder to generate multi-modal embeddings, and generate video captions based on the multi-modal embeddings using a caption decoder.

    VIDEO MATTING
    3.
    发明申请

    公开(公告)号:US20230044969A1

    公开(公告)日:2023-02-09

    申请号:US17396055

    申请日:2021-08-06

    Applicant: Lemon Inc.

    Abstract: The present disclosure describes techniques of improving video matting. The techniques comprise extracting features from each frame of a video by an encoder of a model, wherein the video comprises a plurality of frames; incorporating, by a decoder of the model, into any particular frame temporal information extracted from one or more frames previous to the particular frame, wherein the particular frame and the one or more previous frames are among the plurality of frames of the video, and the decoder is a recurrent decoder; and generating a representation of a foreground object included in the particular frame by the model, wherein the model is trained using segmentation dataset and matting dataset.

Patent Agency Ranking