Patent search ap:("Lemon Inc.") AND inv:"Heng Wang" Page 1

1.

发明公开
UNIFIED TRANSFORMER-BASED VISUAL PLACE RECOGNITION FRAMEWORK 审中-公开

公开(公告)号：US20240338848A1

公开(公告)日：2024-10-10

申请号：US18296438

申请日：2023-04-06

Applicant: Lemon Inc.

Inventor： Sijie Zhu , Linjie Yang , Xiaohui Shen , Heng Wang

IPC: G06T7/73 , G06V10/75 , G06V10/77 , G06V10/774

CPC classification number: G06T7/74 , G06V10/751 , G06V10/7715 , G06V10/774 , G06T2207/20081

Abstract: A unified place recognition framework handles both retrieval and re-ranking with a unified transformer model. The re-ranking modules utilizes feature correlation, attention value, and x/y coordinates into account, and learns to determine whether an image pair is from a same location.

2.

发明公开
MULTIMODAL DATA PROCESSING 审中-公开

公开(公告)号：US20240144664A1

公开(公告)日：2024-05-02

申请号：US18393238

申请日：2023-12-21

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song Bai , Rui Yan , Heng Wang , Junhao Zhang , Chuhui Xue , Wenqing Zhang

IPC: G06V10/82 , G06V10/46

CPC classification number: G06V10/82 , G06V10/467

Abstract: Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.

3.

发明公开
METHOD, APPARATUS, DEVICE AND MEDIUM FOR IMAGE PROCESSING 审中-公开

公开(公告)号：US20240144656A1

公开(公告)日：2024-05-02

申请号：US18394249

申请日：2023-12-22

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song Bai , Junhao Zhang , Heng Wang , Rui Yan , Chuhui Xue , Wenqing Zhang

IPC: G06V10/774 , G06V10/40 , G06V10/74 , G06V10/772 , G06V10/82

CPC classification number: G06V10/774 , G06V10/40 , G06V10/761 , G06V10/772 , G06V10/82

Abstract: A method, apparatus, device, and medium for image processing is provided. The method includes generating, using an image generation process, a first set of synthetic images based on a first set of codes associated with the first image class in a codebook and based on a first class feature associated with a first image class; generating, using a feature extraction process, a first set of reference features based on the first set of synthetic images and generating a first set of target features based on a plurality of sets of training images belonging to the first image class in a training image set; and updating the image generation process and the codebook according to at least a first training objective to reduce a difference between each reference feature in the first set of reference features and a corresponding target feature in the first set of target features.

4.

发明公开
EFFICIENT VIDEO PROCESSING VIA TEMPORAL PROGRESSIVE LEARNING 审中-公开

公开(公告)号：US20230206067A1

公开(公告)日：2023-06-29

申请号：US18111756

申请日：2023-02-20

Applicant: Lemon Inc.

Inventor： Peng Wang , Heng Wang , Xianhang Li , Xinyu Li

IPC: G06N3/08 , G06N3/0455 , G06V10/75 , G06V10/771 , G06V10/77 , G06V10/82 , G06V20/40

CPC classification number: G06N3/08 , G06N3/0455 , G06V10/751 , G06V10/771 , G06V10/7715 , G06V10/82 , G06V20/46

Abstract: Systems and methods for performing temporal progressive learning for video processing are provided herein. Some examples include receiving a video that includes a plurality of frames, extracting a first subset of frames from the plurality of frames, and inputting the first subset of frames into a model that includes an encoder and a decoder. The examples further include comparing a first output of the model to the first subset of frames and updating the encoder, thereby training the encoder, and extracting a second subset of frames from the plurality of frames. The second subset of frames includes a number of frames that is larger than a number of frames in the first subset of frames. The examples further include inputting the second subset of frames into the model, comparing a second output of the model to the second subset of frames and updating the encoder, thereby further training the encoder.

Patent Agency Ranking