-
1.
公开(公告)号:US20240160917A1
公开(公告)日:2024-05-16
申请号:US18182939
申请日:2023-03-13
Applicant: Salesforce, Inc.
Inventor: Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE
CPC classification number: G06N3/08 , G06T19/20 , G06T2210/56 , G06T2219/2004
Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.
-
公开(公告)号:US20240386623A1
公开(公告)日:2024-11-21
申请号:US18477764
申请日:2023-09-29
Applicant: Salesforce, Inc.
Inventor: Ning YU , Can QIN , Shu ZHANG , Yihao FENG , Xinyi YANG , Ran XU
IPC: G06T11/00 , G06T5/20 , G06V10/771
Abstract: Embodiments described herein provide a method of image generation. The method includes a fixed diffusion model, and a trainable diffusion model. The fixed diffusion model may be pretrained on a large training corpus. The trainable diffusion model may be used to control the image generation of the fixed diffusion model by modifying internal representations of the fixed diffusion model. A task instruction may be provided in addition to a text prompt, and the task instruction may guide the trainable diffusion model together with the visual conditions. The visual conditions may be adapted according to the task instruction. During training, a fixed number of task instructions may be used. At inference, unseen task instructions may be used by combining convolutional kernels of the visual condition adapter.
-
3.
公开(公告)号:US20240169704A1
公开(公告)日:2024-05-23
申请号:US18182952
申请日:2023-03-13
Applicant: Salesforce, Inc.
Inventor: Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE
IPC: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82
CPC classification number: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82
Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.
-
-