-
1.
公开(公告)号:US20240160917A1
公开(公告)日:2024-05-16
申请号:US18182939
申请日:2023-03-13
Applicant: Salesforce, Inc.
Inventor: Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE
CPC classification number: G06N3/08 , G06T19/20 , G06T2210/56 , G06T2219/2004
Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.
-
2.
公开(公告)号:US20240169704A1
公开(公告)日:2024-05-23
申请号:US18182952
申请日:2023-03-13
Applicant: Salesforce, Inc.
Inventor: Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE
IPC: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82
CPC classification number: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82
Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.
-