Patent search ap:("Salesforce Page Inc.") AND inv:"Shu Zhang"

1.

发明公开
SYSTEMS AND METHODS FOR MULTIMODAL PRETRAINING FOR THREE-DIMENSIONAL UNDERSTANDING MODELS 审中-公开

公开(公告)号：US20240312128A1

公开(公告)日：2024-09-19

申请号：US18493035

申请日：2023-10-24

Applicant: Salesforce, Inc.

Inventor： Le Xue , Ning Yu , Shu Zhang , Junnan Li , Caiming Xiong , Silvio Savarese , Juan Carlos Niebles Duque , Ran Xu

IPC: G06T17/00 , G06F40/40

CPC classification number: G06T17/00 , G06F40/40

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

2.

发明公开
SYSTEMS AND METHODS FOR FEEDBACK BASED INSTRUCTIONAL VISUAL EDITING 审中-公开

公开(公告)号：US20240303882A1

公开(公告)日：2024-09-12

申请号：US18350876

申请日：2023-07-12

Applicant: Salesforce, Inc.

Inventor： Shu Zhang , Xinyi Yang , Yihao Feng , Ran Xu , Ning Yu , Chia-Chih Chen

IPC: G06T11/60 , G06T5/00

CPC classification number: G06T11/60 , G06T5/70 , G06T2207/20081 , G06T2207/20084

Abstract: Embodiments described herein provide a feedback based instructional image editing framework that employs a diffusion process to follow user instruction for image editing. A diffusion model is fine-tuned using a reward model, which may be trained via human annotation. The training of the reward model may be done by having the image editing model output a number of images, which a human annotator ranks based on their alignment with the original image and a given instruction.

3.

发明公开
SYSTEMS AND METHODS FOR TEXT-TO-IMAGE GENERATION USING LANGUAGE MODELS 审中-公开

公开(公告)号：US20240185035A1

公开(公告)日：2024-06-06

申请号：US18162535

申请日：2023-01-31

Applicant: Salesforce, Inc.

Inventor： Ning Yu , Can Qin , Chen Xing , Shu Zhang , Stefano Ermon , Caiming Xiong , Ran Xu

IPC: G06N3/0455 , G06T5/00

CPC classification number: G06N3/0455 , G06T5/002 , G06T2207/20084

Abstract: Embodiments described herein provide a mechanism for replacing existing text encoders in text-to-image generation models with more powerful pre-trained language models. Specifically, a translation network is trained to map features from the pre-trained language model output into the space of the target text encoder. The training preserves the rich structure of the pre-trained language model while allowing it to operate within the text-to-image generation model. The resulting modularized text-to-image model receives prompt and generates an image representing the features contained in the prompt.

4.

发明授权
Systems and methods for vision-language distribution alignment 有权

公开(公告)号：US12112523B2

公开(公告)日：2024-10-08

申请号：US17589725

申请日：2022-01-31

Applicant: Salesforce, Inc.

Inventor： Shu Zhang , Junnan Li , Ran Xu , Caiming Xiong , Chetan Ramaiah

IPC: G06V10/776 , G06F16/56 , G06F16/583 , G06F40/126 , G06F40/166 , G06F40/284 , G06V10/74 , G06V10/80

CPC classification number: G06V10/776 , G06F16/56 , G06F16/5846 , G06F40/126 , G06F40/166 , G06F40/284 , G06V10/761 , G06V10/806

Abstract: Embodiments described herein a CROss-Modal Distribution Alignment (CROMDA) model for vision-language pretraining, which can be used for retrieval downstream tasks. In the CROMDA mode, global cross-modal representations are aligned on each unimodality. Specifically, a uni-modal global similarity between an image/text and the image/text feature queue are computed. A softmax-normalized distribution is then generated based on the computed similarity. The distribution thus takes advantage of property of the global structure of the queue. CROMDA then aligns the two distributions and learns a modal invariant global representation. In this way, CROMDA is able to obtain invariant property in each modality, where images with similar text representations should be similar and vice versa.

Patent Agency Ranking