Patent search ap:("Lemon Inc.") AND inv:"Chuhui Xue" Page 1

1.

发明授权
Pre-training for scene text detection 有权

公开(公告)号：US12254707B2

公开(公告)日：2025-03-18

申请号：US17955285

申请日：2022-09-28

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Chuhui Xue , Wenqing Zhang , Yu Hao , Song Bai

IPC: G06V20/62 , G06V30/18 , G06V30/19

Abstract: Embodiments of the present disclosure relate to a method, device and computer readable storage medium of scene text detection. In the method, a first visual representation of a first image is generated with an image encoding process. A first textual representation of a first text unit in the first image is generated with a text encoding process based on a first plurality of symbols obtained by masking a first symbol of a plurality of symbols in the first text unit. A first prediction of the masked first symbol is determined with a decoding process based on the first visual and textual representations. At least the image encoding process is updating according to at least a first training objective to increase at least similarity of the first prediction and the masked first symbol.

2.

发明公开
METHOD, APPARATUS, DEVICE AND MEDIUM FOR IMAGE PROCESSING 审中-公开

公开(公告)号：US20240144656A1

公开(公告)日：2024-05-02

申请号：US18394249

申请日：2023-12-22

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song Bai , Junhao Zhang , Heng Wang , Rui Yan , Chuhui Xue , Wenqing Zhang

IPC: G06V10/774 , G06V10/40 , G06V10/74 , G06V10/772 , G06V10/82

CPC classification number: G06V10/774 , G06V10/40 , G06V10/761 , G06V10/772 , G06V10/82

Abstract: A method, apparatus, device, and medium for image processing is provided. The method includes generating, using an image generation process, a first set of synthetic images based on a first set of codes associated with the first image class in a codebook and based on a first class feature associated with a first image class; generating, using a feature extraction process, a first set of reference features based on the first set of synthetic images and generating a first set of target features based on a plurality of sets of training images belonging to the first image class in a training image set; and updating the image generation process and the codebook according to at least a first training objective to reduce a difference between each reference feature in the first set of reference features and a corresponding target feature in the first set of target features.

3.

发明申请
INTERACTIVE POINT-BASED IMAGE EDITING 有权

公开(公告)号：US20250166267A1

公开(公告)日：2025-05-22

申请号：US18949486

申请日：2024-11-15

Applicant: Lemon Inc.

Inventor： Song BAI , Yujun Shi , Chuhui Xue , Wenqing Zhang

IPC: G06T11/60 , G06V10/48

Abstract: Embodiments of the disclosure relate to interactive point-based image editing. According to example embodiments of the present disclosure, a user edit input for a source image is obtained to indicate at least one handle point and at least one target point in the source image. A feature map is extracted from the source image using a diffusion model at an iteration step of an inverse denoising diffusion process performed on the source image. The feature map is then updated based on the user edit input. Then a target image is generated based on the updated feature map using the diffusion model through a denoising diffusion process performed on the updated feature map.

4.

发明公开
OPEN VOCABULARY 3D SCENE PROCESSING 审中-公开

公开(公告)号：US20230290051A1

公开(公告)日：2023-09-14

申请号：US18183869

申请日：2023-03-14

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song BAI , Runyu Ding , Jihan Yang , Chuhui Xue , Wenqing Zhang , Xiaojuan Qi

IPC: G06T17/00 , G06T7/70 , G06V10/44 , G06V10/764 , G06V20/70

CPC classification number: G06T17/00 , G06T7/70 , G06V10/44 , G06V10/764 , G06V20/70 , G06V2201/07

Abstract: A method is proposed for detecting an object in a 3D scene, including obtaining a detecting model that describes an association relationship between a plurality of base classes of a plurality of objects and 3D data of the plurality of objects. A plurality of open classes of a plurality of candidate objects to be detected in a 3D scene are received, the plurality of open classes comprise the plurality of base classes and at least one novel class not in the plurality of base classes. A 3D portion is detected in 3D data of the 3D scene based on the detecting model and the plurality of open classes, the 3D portion corresponds to a target candidate object in the plurality of candidate objects. With this method, objects that belong to a novel class, not annotated in training data of the detecting model, may be detected from the 3D data.

5.

发明申请
DEBIASING TEXT-TO-IMAGE DIFFUSION MODELS 有权

公开(公告)号：US20250139846A1

公开(公告)日：2025-05-01

申请号：US19009706

申请日：2025-01-03

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song Bai , Ruifei He , Chuhui Xue , Wenqing Zhang , Yingchen Yu

IPC: G06T11/00 , G06V10/25 , G06V10/74

Abstract: There are provided methods, devices, and computer program products for image generation, particularly to debiasing text-to-image diffusion models. In a method, a plurality of images are obtained by an image generating model based on a prompt. The plurality of images comprises a plurality of instances of an object, respectively and the object is specified by the prompt. A plurality of attributes of the plurality of instances of the object are determined respectively. The image generating model is updated based on the plurality of attributes and a predetermined distribution of a plurality of predetermined attributes related to the object. With the above method, the images generated by the updated image generating model may follow the predetermined distribution, and the updated image generating model may output debiased results.

6.

发明公开
MULTIMODAL DATA PROCESSING 审中-公开

公开(公告)号：US20240144664A1

公开(公告)日：2024-05-02

申请号：US18393238

申请日：2023-12-21

Applicant: Lemon Inc. , Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Song Bai , Rui Yan , Heng Wang , Junhao Zhang , Chuhui Xue , Wenqing Zhang

IPC: G06V10/82 , G06V10/46

CPC classification number: G06V10/82 , G06V10/467

Abstract: Embodiments of the present disclosure provide a solution for multimodal data processing. A method comprises: obtaining image data and text data; and extracting a target visual feature of image data and a target textual feature of text data using a feature extraction model. The feature extraction model comprises alternatively deployed cross-modal encoding parts and visual encoding parts. The extracting comprises: performing, using a first cross-modal encoding part of the feature extraction model, cross-modal feature encoding on a first intermediate visual feature of the image data and a first intermediate textual feature of the text data, to obtain a second intermediate visual feature and a second intermediate textual feature; performing, using a first visual encoding part of the feature extraction model, visual modal feature encoding on the second intermediate visual feature, to obtain a third intermediate visual feature.

Patent Agency Ranking