MULTIMODAL DATA PROCESSING
    2.
    发明公开

    公开(公告)号:US20240144664A1

    公开(公告)日:2024-05-02

    申请号:US18393238

    申请日:2023-12-21

    CPC classification number: G06V10/82 G06V10/467

    Abstract: Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.

    EFFICIENT VIDEO PROCESSING VIA TEMPORAL PROGRESSIVE LEARNING

    公开(公告)号:US20230206067A1

    公开(公告)日:2023-06-29

    申请号:US18111756

    申请日:2023-02-20

    Applicant: Lemon Inc.

    Abstract: Systems and methods for performing temporal progressive learning for video processing are provided herein. Some examples include receiving a video that includes a plurality of frames, extracting a first subset of frames from the plurality of frames, and inputting the first subset of frames into a model that includes an encoder and a decoder. The examples further include comparing a first output of the model to the first subset of frames and updating the encoder, thereby training the encoder, and extracting a second subset of frames from the plurality of frames. The second subset of frames includes a number of frames that is larger than a number of frames in the first subset of frames. The examples further include inputting the second subset of frames into the model, comparing a second output of the model to the second subset of frames and updating the encoder, thereby further training the encoder.

Patent Agency Ranking