Patent search ap:("Salesforce Page Inc.") AND inv:"Caiming XIONG"

1.

发明公开
SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION 审中-公开

公开(公告)号：US20240160917A1

公开(公告)日：2024-05-16

申请号：US18182939

申请日：2023-03-13

Applicant: Salesforce, Inc.

Inventor： Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE

IPC: G06N3/08 , G06T19/20

CPC classification number: G06N3/08 , G06T19/20 , G06T2210/56 , G06T2219/2004

Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.

2.

发明公开
SYSTEMS AND METHODS FOR LEARNING UNIFIED REPRESENTATIONS OF LANGUAGE, IMAGE, AND POINT CLOUD FOR THREE-DIMENSIONAL RECOGNITION 审中-公开

公开(公告)号：US20240169704A1

公开(公告)日：2024-05-23

申请号：US18182952

申请日：2023-03-13

Applicant: Salesforce, Inc.

Inventor： Le XUE , Chen XING , Juan Carlos NIEBLES DUQUE , Caiming XIONG , Ran XU , Silvio SAVARESE

IPC: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82

CPC classification number: G06V10/774 , G06F40/126 , G06F40/40 , G06V10/764 , G06V10/776 , G06V10/82

Abstract: Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification