Patent search ap:("Lemon Inc.") AND inv:"Heng WANG" Page 1

1.

发明申请
VIDEO CAPTIONING GENERATION SYSTEM AND METHOD 有权

公开(公告)号：US20240380949A1

公开(公告)日：2024-11-14

申请号：US18314019

申请日：2023-05-08

Applicant: Lemon Inc.

Inventor： Linjie YANG , Heng WANG , Yuhan SHEN , Longyin WEN , Haichao YU

IPC: H04N21/488 , H04N21/2389 , H04N21/84

Abstract: A system and a method are provided that include a processor executing a caption generation program to receive an input video, sample video frames from the input video, extract video frames from the input video, extract video embeddings and audio embeddings from the video frames, including local video tokens and local audio tokens, respectively, input the local video tokens and the local audio tokens into at least a transformer layer of a cross-modal encoder to generate multi-modal embeddings, and generate video captions based on the multi-modal embeddings using a caption decoder.

2.

发明公开
PROCESSING METHOD, ELECTRONIC DEVICE, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR MULTIMODAL DATA 审中-公开

公开(公告)号：US20240233350A1

公开(公告)日：2024-07-11

申请号：US18408967

申请日：2024-01-10

Applicant: Lemon Inc. , Beijing Zitiao Network Technology Co., Ltd.

Inventor： Xiaojie JIN , Fan MA , Jiashi FENG , Heng WANG , Jingjia HUANG

IPC: G06V10/80 , G06F40/284 , G06V10/774 , G06V20/40

CPC classification number: G06V10/806 , G06F40/284 , G06V10/774 , G06V20/46

Abstract: The embodiments of the disclosure provides a processing method, apparatus, electronic device and non-transitory computer-readable storage medium for multimodal data, wherein the method includes: obtaining data to be processed of an original modality; determining result data of a target modality corresponding to the data to be processed by processing the data to be processed with a target processing model; wherein the target processing model comprises a multimodal submodel, and the pre-training task of the multimodal submodel includes a task of locating local data that matches second modal data from first modal data; wherein when the first modal data belongs to the original modality, the second modal data belongs to the target modality; when the first modal data belongs to the target modality, the second modal data belongs to the original modality.

Patent Agency Ranking