-
公开(公告)号:US20240380949A1
公开(公告)日:2024-11-14
申请号:US18314019
申请日:2023-05-08
Applicant: Lemon Inc.
Inventor: Linjie YANG , Heng WANG , Yuhan SHEN , Longyin WEN , Haichao YU
IPC: H04N21/488 , H04N21/2389 , H04N21/84
Abstract: A system and a method are provided that include a processor executing a caption generation program to receive an input video, sample video frames from the input video, extract video frames from the input video, extract video embeddings and audio embeddings from the video frames, including local video tokens and local audio tokens, respectively, input the local video tokens and the local audio tokens into at least a transformer layer of a cross-modal encoder to generate multi-modal embeddings, and generate video captions based on the multi-modal embeddings using a caption decoder.
-
公开(公告)号:US20250104423A1
公开(公告)日:2025-03-27
申请号:US18725683
申请日:2022-12-27
Applicant: Lemon Inc.
Inventor: Longyin WEN , Kai XU , Xiaohui SHEN
IPC: G06V20/40 , G06V10/26 , G06V10/764 , G06V10/77 , G06V10/82
Abstract: Provided in the embodiments of the present disclosure are a video processing method and device. The video processing method includes: determining a target image to be processed in a video; performing semantic segmentation on the target image through a convolutional neural network to obtain a first feature map, wherein the first feature map comprises a feature map corresponding to at least one semantic class; determining a target image region corresponding to the at least one semantic class in the target image according to the first feature map; wherein the at least one semantic class comprises an object-in-hand, and a training image adopted by the convolutional neural network in a training process is marked with an image region corresponding to the at least one semantic class.
-
公开(公告)号:US20240346820A1
公开(公告)日:2024-10-17
申请号:US18301165
申请日:2023-04-14
Applicant: Lemon Inc.
Inventor: Siqi TAN , Longyin WEN , Xinyao WANG , Erica Lynne RUZIC , Kin Chung WONG , Yi DUAN , Thomas OEFVERSTROEM
CPC classification number: G06V20/41 , G06F40/40 , G06V10/70 , G06V20/62 , G10L15/26 , G10L25/57 , G06F3/0482
Abstract: The present disclosure provides systems and methods for generating comments corresponding to an input video. Given an input video, comments with content relevant to the input video can be generated. One aspect includes a computing system comprising a processor and memory. The processor can be configured to execute a program using portions of the memory to receive an input video in a social networking system, generate at least one predicted comment corresponding to the input video based on video frames of the input video and a user profile of a target user, and present the at least one predicted comment to the target user.
-
-