-
公开(公告)号:US20190108400A1
公开(公告)日:2019-04-11
申请号:US16152755
申请日:2018-10-05
Applicant: QUALCOMM Incorporated
Inventor: Victor Augusto ESCORCIA , Mihir JAIN , Amirhossein HABIBIAN , Cornelis Gerardus Maria SNOEK
Abstract: A method for generating action proposals in a sequence of frames comprises determining, at each frame of the sequence of frames, at least one possible action location for a type of actor to be detected. The method also expands, for each frame of the sequence of frames, the at least one possible action location to neighboring regions in neighboring frames from a given frame to identify a similar location between the given frame and each one of the neighboring frames. The method further comprises associating a most similar possible action location over the sequence of frames to generate the action proposals. The method also comprises classifying an action in the sequence of frames based on the action proposals and controlling an action of a device based on the classifying.
-
公开(公告)号:US20170262705A1
公开(公告)日:2017-09-14
申请号:US15267621
申请日:2016-09-16
Applicant: QUALCOMM Incorporated
Inventor: Zhenyang LI , Efstratios GAVVES , Mihir JAIN , Cornelis Gerardus Maria SNOEK
CPC classification number: G06K9/00718 , G06K9/00342 , G06K9/6269 , G06N3/0445 , G06N3/0454
Abstract: A method of predicting action labels for a video stream includes receiving the video stream and calculating an optical flow of consecutive frames of the video stream. An attention map is generated from the current frame of the video stream and the calculated optical flow. An action label is predicted for the current frame based on the optical flow, a previous hidden state and the attention map.
-
公开(公告)号:US20220318553A1
公开(公告)日:2022-10-06
申请号:US17219460
申请日:2021-03-31
Applicant: QUALCOMM Incorporated
Inventor: Haitam BEN YAHIA , Amir GHODRATI , Mihir JAIN , Amirhossein HABIBIAN
Abstract: Systems and techniques are provided for performing holistic video understanding. For example a process can include obtaining a first video and determining, using a machine learning model decision engine, a first machine learning model from a set of machine learning models to use for processing at least a portion of the first video. The first machine learning model can be determined based on one or more characteristics of at least the portion of the first video. The process can include processing at least the portion of the first video using the first machine learning model.
-
公开(公告)号:US20190108399A1
公开(公告)日:2019-04-11
申请号:US16152301
申请日:2018-10-04
Applicant: QUALCOMM Incorporated
Inventor: Victor Augusto ESCORCIA , Mihir JAIN , Amirhossein HABIBIAN , Cornelis Gerardus Maria SNOEK
IPC: G06K9/00
Abstract: A method for processing a sequence of frames includes receiving a sequence of frames and multiple action proposals for the sequence of frames. The method also includes generating a representation of the sequence of frames and pooling the representation around each of the action proposals. The method further includes classifying the action proposals based on the pooled representations and controlling a device based on the classifying.
-
公开(公告)号:US20240303987A1
公开(公告)日:2024-09-12
申请号:US18360741
申请日:2023-07-27
Applicant: QUALCOMM Incorporated
Inventor: Juntae LEE , Mihir JAIN , Sungrack YUN
IPC: G06V20/40 , G06F16/732 , G06F16/735 , G06F16/75
CPC classification number: G06V20/48 , G06F16/7328 , G06F16/735 , G06F16/75 , G06V20/41 , G06V10/82
Abstract: Aspects of the disclosure are directed to an apparatus configured to perform common-action localization. In certain aspects, the apparatus may receive a query video comprising a plurality of frames, wherein a first query proposal is determined based on a subset of frames of the plurality of frames, the first query proposal indicative of an action depicted on the subset of frames. In certain aspects, the apparatus may determine a first attendance for a first support video of a plurality of support videos. In certain aspects, the apparatus may determine a second attendance for a second support video of the plurality of support videos after computing the first attendance.
-
公开(公告)号:US20220101087A1
公开(公告)日:2022-03-31
申请号:US17405879
申请日:2021-08-18
Applicant: QUALCOMM Incorporated
Inventor: Juntae LEE , Mihir JAIN , Sungrack YUN , Hyoungwoo PARK , Kyu Woong HWANG
Abstract: A method performed by an artificial neural network (ANN) includes determining, at a first stage of a multi-stage cross-attention model of the ANN, a first cross-correlation between a first representation of each modality of a number of modalities associated with a sequence of inputs. The method still further includes determining, at each second stage of one or more second stages of the multi-stage cross-attention model, a second cross-correlation between first attended representations of each modality. The method also includes generating a concatenated feature representation associated with a final second stage of the one or more second stages based on the second cross-correlation associated with the final second stage, the first attended representation of each modality, and the first representation of each modality. The method further includes determining a probability distribution between a set of background actions and a set of foreground actions from the concatenated feature representation. The method still further includes localizing an action in the sequence of inputs based on the probability distribution.
-
7.
公开(公告)号:US20170262996A1
公开(公告)日:2017-09-14
申请号:US15250755
申请日:2016-08-29
Applicant: QUALCOMM Incorporated
Inventor: Mihir JAIN , Zhenyang LI , Efstratios GAVVES , Cornelis Gerardus Maria SNOEK
CPC classification number: G06T7/0087 , G06K9/00718 , G06K9/3216 , G06K9/3241 , G06K9/40 , G06K9/4671 , G06K9/628 , G06K2009/00738 , G06N3/0445 , G06N3/0454 , G06T7/143 , G06T2207/10016 , G06T2210/12
Abstract: A method generates bounding-boxes within frames of a sequence of frames. The bounding-boxes may be generated via a recurrent neural network (RNN) such as a long short-term memory (LSTM) network. The method includes receiving the sequence of frames and generating an attention feature map for each frame of the sequence of frames. Each attention feature map indicates at least one potential moving object. The method also includes up-sampling each attention feature map to determine an attention saliency for pixels in each frame of the sequence of frames. The method further includes generating a bounding-box within each frame based on the attention saliency and temporally smoothing multiple bounding-boxes along the sequence of frames to obtain a smooth sequence of bounding-boxes. The method still further includes localizing an action location within each frame based on the smooth sequence of bounding-boxes.
-
公开(公告)号:US20170262995A1
公开(公告)日:2017-09-14
申请号:US15249280
申请日:2016-08-26
Applicant: QUALCOMM Incorporated
Inventor: Zhenyang LI , Efstratios GAVVES , Mihir JAIN , Cornelis Gerardus Maria SNOEK
CPC classification number: G06T7/11 , G06K9/00335 , G06K9/00718 , G06N3/0445 , G06N3/0454 , G06N3/08 , G06T7/0081 , G06T2207/10004 , G06T2207/20084
Abstract: A method of processing data within a convolutional attention recurrent neural network (RNN) includes generating a current multi-dimensional attention map. The current multi-dimensional attention map indicates areas of interest in a first frame from a sequence of spatio-temporal data. The method further includes receiving a multi-dimensional feature map. The method also includes convolving the current multi-dimensional attention map and the multi-dimensional feature map to obtain a multi-dimensional hidden state and a next multi-dimensional attention map. The method identifies a class of interest in the first frame based on the multi-dimensional hidden state and training data.
-
-
-
-
-
-
-