-
1.
公开(公告)号:US20250148633A1
公开(公告)日:2025-05-08
申请号:US18666502
申请日:2024-05-16
Applicant: QUALCOMM Incorporated
Inventor: Rajeev YASARLA , Hong CAI , Risheek GARREPALLI , Yinhao ZHU , Jisoo JEONG , Yunxiao SHI , Manish Kumar SINGH , Fatih Murat PORIKLI
Abstract: Systems and techniques are provided for generating depth information. For example, a process can include obtaining a first feature volume including visual features corresponding to each respective frame included in a first set of frames. A first query generator network can generate reconstruction features associated with a reconstructed feature volume corresponding to the first feature volume. Based on the first feature volume, a second query generator network can generate motion features associated with predicted future motion corresponding to the first feature volume. An initial depth prediction can be generated for each respective frame based on cross-attention between features of a depth prediction decoder, the reconstruction features, and the motion features. A refined depth prediction can be generated for each respective based on cross-attention between the initial depth prediction, the reconstruction features, and the motion features.
-
公开(公告)号:US20250094793A1
公开(公告)日:2025-03-20
申请号:US18469909
申请日:2023-09-19
Applicant: QUALCOMM Incorporated
Inventor: Manish Kumar SINGH , Tianyu JIANG , Hsin-Pai CHENG , Kartikeya BHARDWAJ , Hong CAI , Mingu LEE , Munawar HAYAT , Christopher LOTT , Fatih Murat PORIKLI
IPC: G06N3/0499
Abstract: A processor-implemented method for image or text processing includes receiving, by an artificial neural network (ANN) model, a set of tokens corresponding to an input. A token interaction block of the ANN model processes the set of tokens according to each channel of the input to generate a spatial mixture of a set of features for each channel of the input. A feed forward network block of the ANN model generates a mixture of channel features based on the spatial mixture of the set of features for each channel of the input. An attention block of the ANN model determines a set of attended features of the mixture of channel features according to a set of attention weights. In turn, the ANN model generates an inference based on the set of attend features of the mixture of channel features.
-
公开(公告)号:US20250148628A1
公开(公告)日:2025-05-08
申请号:US18633302
申请日:2024-04-11
Applicant: QUALCOMM Incorporated
Inventor: Yunxiao SHI , Hong CAI , Manish Kumar SINGH , Shizhong Steve HAN , Yinhao ZHU , Fatih Murat PORIKLI
Abstract: Systems and techniques are provided for generating depth information from one or more images. For example, a process can include obtaining a first depth map corresponding to an input comprising an image of the one or more images and a sparse depth measurement. A three-dimensional (3D) point cloud can be generated based on the first depth map and multi-scale visual features of the input, wherein the 3D point cloud includes a plurality of 3D point features uplifted from the multi-scale visual features. At least a portion of the plurality of 3D point features can be processed using one or more self-attention layers to generate refined 3D point features. A two-dimensional (2D) projection of the refined 3D point features can be generated and a second depth map can be generated based on the 2D projection of the refined 3D point features.
-
公开(公告)号:US20240428576A1
公开(公告)日:2024-12-26
申请号:US18613263
申请日:2024-03-22
Applicant: QUALCOMM Incorporated
Inventor: Tianyu JIANG , Manish Kumar SINGH , Hsin-Pai CHENG , Hong CAI , Mingu LEE , Kartikeya BHARDWAJ , Christopher LOTT , Fatih Murat PORIKLI
Abstract: Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. A transformed version of image pixels is accessed as input to an attention layer of a machine learning model. A number of local attention operations to apply, in one transformer, to the transformed version of image pixels is selected based at least in part on a size of the transformed version of image pixels. A transformer output for the attention layer of the machine learning model is generated based on applying the number of local attention operations and at least one global attention operation to the transformed version of image pixels.
-
公开(公告)号:US20240412493A1
公开(公告)日:2024-12-12
申请号:US18537404
申请日:2023-12-12
Applicant: QUALCOMM Incorporated
Inventor: Risheek GARREPALLI , Yunxiao SHI , Hong CAI , Yinhao ZHU , Shubhankar Mangesh BORSE , Jisoo JEONG , Debasmit DAS , Manish Kumar SINGH , Rajeev YASARLA , Shizhong Steve HAN , Fatih Murat PORIKLI
IPC: G06V10/776 , G06T7/50 , G06V10/764 , G06V10/82 , G06V20/70
Abstract: Systems and techniques are provided for processing image data. According to some aspects, a computing device can generate a gradient (e.g., a classifier gradient using a trained classifier) associated with a current sample. The computing device can combine the gradient with an iterative model estimated score function or data associated with the current sample to generate a score function estimate. The computing device can predict, using the diffusion machine learning model and based on the score function estimate, a new sample.
-
-
-
-