-
公开(公告)号:US20240338848A1
公开(公告)日:2024-10-10
申请号:US18296438
申请日:2023-04-06
Applicant: Lemon Inc.
Inventor: Sijie Zhu , Linjie Yang , Xiaohui Shen , Heng Wang
IPC: G06T7/73 , G06V10/75 , G06V10/77 , G06V10/774
CPC classification number: G06T7/74 , G06V10/751 , G06V10/7715 , G06V10/774 , G06T2207/20081
Abstract: A unified place recognition framework handles both retrieval and re-ranking with a unified transformer model. The re-ranking modules utilizes feature correlation, attention value, and x/y coordinates into account, and learns to determine whether an image pair is from a same location.
-
公开(公告)号:US20240144664A1
公开(公告)日:2024-05-02
申请号:US18393238
申请日:2023-12-21
Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.
Inventor: Song Bai , Rui Yan , Heng Wang , Junhao Zhang , Chuhui Xue , Wenqing Zhang
CPC classification number: G06V10/82 , G06V10/467
Abstract: Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.
-
公开(公告)号:US20240144656A1
公开(公告)日:2024-05-02
申请号:US18394249
申请日:2023-12-22
Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.
Inventor: Song Bai , Junhao Zhang , Heng Wang , Rui Yan , Chuhui Xue , Wenqing Zhang
IPC: G06V10/774 , G06V10/40 , G06V10/74 , G06V10/772 , G06V10/82
CPC classification number: G06V10/774 , G06V10/40 , G06V10/761 , G06V10/772 , G06V10/82
Abstract: A method, apparatus, device, and medium for image processing is provided. The method includes generating, using an image generation process, a first set of synthetic images based on a first set of codes associated with the first image class in a codebook and based on a first class feature associated with a first image class; generating, using a feature extraction process, a first set of reference features based on the first set of synthetic images and generating a first set of target features based on a plurality of sets of training images belonging to the first image class in a training image set; and updating the image generation process and the codebook according to at least a first training objective to reduce a difference between each reference feature in the first set of reference features and a corresponding target feature in the first set of target features.
-
公开(公告)号:US20230206067A1
公开(公告)日:2023-06-29
申请号:US18111756
申请日:2023-02-20
Applicant: Lemon Inc.
Inventor: Peng Wang , Heng Wang , Xianhang Li , Xinyu Li
IPC: G06N3/08 , G06N3/0455 , G06V10/75 , G06V10/771 , G06V10/77 , G06V10/82 , G06V20/40
CPC classification number: G06N3/08 , G06N3/0455 , G06V10/751 , G06V10/771 , G06V10/7715 , G06V10/82 , G06V20/46
Abstract: Systems and methods for performing temporal progressive learning for video processing are provided herein. Some examples include receiving a video that includes a plurality of frames, extracting a first subset of frames from the plurality of frames, and inputting the first subset of frames into a model that includes an encoder and a decoder. The examples further include comparing a first output of the model to the first subset of frames and updating the encoder, thereby training the encoder, and extracting a second subset of frames from the plurality of frames. The second subset of frames includes a number of frames that is larger than a number of frames in the first subset of frames. The examples further include inputting the second subset of frames into the model, comparing a second output of the model to the second subset of frames and updating the encoder, thereby further training the encoder.
-
-
-