-
公开(公告)号:US20240169746A1
公开(公告)日:2024-05-23
申请号:US18161661
申请日:2023-01-30
Applicant: Salesforce, Inc.
Inventor: Manli Shu , Le Xue , Ning Yu , Roberto Martín-Martín , Juan Carlos Niebles Duque , Caiming Xiong , Ran Xu
CPC classification number: G06V20/64 , G06T3/4007 , G06V10/46 , G06V10/82
Abstract: Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.
-
公开(公告)号:US20240070868A1
公开(公告)日:2024-02-29
申请号:US18159318
申请日:2023-01-25
Applicant: Salesforce, Inc.
Inventor: Ning Yu , Vibashan Vishnukumar Sharmini , Chen Xing , Juan Carlos Niebles Duque , Ran Xu
CPC classification number: G06T7/11 , G06V10/273
Abstract: Embodiments described herein provide an open-vocabulary instance segmentation framework that adopts a pre-trained vision-language model to develop a pipeline in detecting novel categories of instances.
-
公开(公告)号:US20250045567A1
公开(公告)日:2025-02-06
申请号:US18498257
申请日:2023-10-31
Applicant: Salesforce, Inc.
Inventor: Weiran Yao , Shelby Heinecke , Juan Carlos Niebles Duque , Zhiwei Liu , Yihao Feng , Le Xue , Rithesh Murthy , Zeyuan Chen , Jianguo Zhang , Devansh Arpit , Ran Xu , Lik Mui , Huan Wang , Caiming Xiong , Silvio Savarese
IPC: G06N3/0455 , G06N3/092
Abstract: Embodiments described herein provide for optimizing a language model (LM) agent. In at least one embodiment, and LM agent comprises an “actor” LM and a “retrospective LM which provides reflections on attempts by the actor LM. The reflections are used to update subsequent prompts to the actor LM. Optimizing the LM agent comprises fine-tuning parameters of the retrospective LM while keeping parameters of the actor LM frozen. A gradient may be determined by a change in reward from the environment based on actions taken by the actor LM with and without a reflection of the retrospective LM. Using this gradient, parameters of the retrospective LM may be updated via backpropagation.
-
公开(公告)号:US20250053793A1
公开(公告)日:2025-02-13
申请号:US18494393
申请日:2023-10-25
Applicant: Salesforce, Inc.
Inventor: Zhiwei Liu , Weiran Yao , Jianguo Zhang , Le Xue , Shelby Heinecke , Rithesh Murthy , Yihao Feng , Zeyuan Chen , Juan Carlos Niebles Duque , Devansh Arpit , Ran Xu , Lik Mui , Huan Wang , Caiming Xiong , Silvio Savarese
Abstract: Embodiments described herein provide a method of predicting an action by a plurality of language model augmented agents (LAAs). In at least one embodiment, a controller receives a task instruction to be performed using an environment. The controller receives an observation of a first state from the environment. The controller selects a LAA from the plurality of LAAs based on the task instruction and the observation. The controller obtains an output from the selected LAA generated using an input combining the task instruction, the observation, and an LAA-specific prompt template. The controller determines the action based on the output. The controller causes the action to be performed on the environment thereby causing the first state of the environment to change to a second state.
-
公开(公告)号:US20240370718A1
公开(公告)日:2024-11-07
申请号:US18400477
申请日:2023-12-29
Applicant: Salesforce, Inc.
Inventor: Artemis Panagopoulou , Le Xue , Ning Yu , Junnan Li , Dongxu Li , Silvio Savarese , Shafiq Rayhan Joty , Ran Xu , Caiming Xiong , Juan Carlos Niebles Duque
IPC: G06N3/08 , G06N3/0455
Abstract: Embodiments described herein provide a method of generating a multi-modal task output to a text instruction relating to inputs of multiple different modalities (e.g., text, audio, video, 3D). The method comprises receiving, via a data interface, a first input of a first modality, a second input of a second modality and the text instruction relating to the first and the second inputs; encoding, by a first multimodal encoder adapted for the first modality, the first input of the first modality into a first encoded representation conditioned on the text instruction; encoding, by a second multimodal encoder adapted for the second modality, the second input of the second modality into a second encoded representation conditioned on the text instruction; and generating, by a neural network based language model, the multi-modal task output based on an input combining the first encoded representation, the second encoded representation, and the text instruction.
-
6.
公开(公告)号:US20240312128A1
公开(公告)日:2024-09-19
申请号:US18493035
申请日:2023-10-24
Applicant: Salesforce, Inc.
Inventor: Le Xue , Ning Yu , Shu Zhang , Junnan Li , Caiming Xiong , Silvio Savarese , Juan Carlos Niebles Duque , Ran Xu
Abstract: A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.
-
公开(公告)号:US20240104809A1
公开(公告)日:2024-03-28
申请号:US18161680
申请日:2023-01-30
Applicant: Salesforce, Inc.
Inventor: Ning Yu , Chia-Chih Chen , Zeyuan Chen , Caiming Xiong , Juan Carlos Niebles Duque , Ran Xu , Rui Meng
IPC: G06T11/60 , G06F40/106 , G06F40/126 , G06N20/00 , G06T9/00
CPC classification number: G06T11/60 , G06F40/106 , G06F40/126 , G06N20/00 , G06T9/00 , G06T2200/24 , G06T2210/12
Abstract: Embodiments described herein provide systems and methods for multimodal layout generations for digital publications. The system may receive as inputs, a background image, one or more foreground texts, and one or more foreground images. Feature representations of the background image may be generated. The foreground inputs may be input to a layout generator which has cross attention to the background image feature representations in order to generate a layout comprising of bounding box parameters for each input item. A composite layout may be generated based on the inputs and generated bounding boxes. The resulting composite layout may then be displayed on a user interface.
-
-
-
-
-
-