-
公开(公告)号:US11902548B2
公开(公告)日:2024-02-13
申请号:US17203613
申请日:2021-03-16
申请人: Deepak Sridhar , Niamul Quader , Srikanth Muralidharan , Yaoxin Li , Juwei Lu , Peng Dai
发明人: Deepak Sridhar , Niamul Quader , Srikanth Muralidharan , Yaoxin Li , Juwei Lu , Peng Dai
摘要: Systems, methods, and computer media of processing a video are disclosed. An example method may include: receiving a plurality of video frames of a video; generating a plurality of first input features based on the plurality of video frames; generating a plurality of second input features based on reversing a temporal order of the plurality of first input features; generating a first set of joint attention features based on the plurality of first input features; generating a second set of joint attention features based on the plurality of second input features; and concatenating the first set of joint attention features and the second set of joint attention features to generate a final set of joint attention features.
-
公开(公告)号:US20210142106A1
公开(公告)日:2021-05-13
申请号:US17095257
申请日:2020-11-11
申请人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
发明人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
摘要: Methods and systems for updating the weights of a set of convolution kernels of a convolutional layer of a neural network are described. A set of convolution kernels having attention-infused weights is generated by using an attention mechanism based on characteristics of the weights. For example, a set of location-based attention multipliers is applied to weights in the set of convolution kernels, a magnitude-based attention function is applied to the weights in the set of convolution kernels, or both. An output activation map is generated using the set of convolution kernels with attention-infused weights. A loss for the neural network is computed, and the gradient is back propagated to update the attention-infused weights of the convolution kernels.
-
公开(公告)号:US11669743B2
公开(公告)日:2023-06-06
申请号:US16874478
申请日:2020-05-14
申请人: Niamul Quader , Juwei Lu , Peng Dai , Wei Li
发明人: Niamul Quader , Juwei Lu , Peng Dai , Wei Li
IPC分类号: H04N21/462 , G06K9/62 , H04N21/466 , G06N3/08 , G06V20/40 , G06N3/04 , H04N21/4402 , G06F9/50
CPC分类号: H04N21/4621 , G06F9/5055 , G06K9/6277 , G06N3/0454 , G06N3/08 , G06V20/41 , H04N21/440227 , H04N21/4666 , G06V20/44
摘要: An adaptive action recognizer for video that performs multiscale spatiotemporal decomposition of video to generate lower complexity video. The adaptive action recognizer has a number of processing pathways, one for each level of video complexity with each processing pathway having a different computational cost. The adaptive action recognizer applies a decision making scheme that encourages using low average computational costs while retaining high accuracy.
-
公开(公告)号:US11698926B2
公开(公告)日:2023-07-11
申请号:US17524862
申请日:2021-11-12
申请人: Arnab Kumar Mondal , Deepak Sridhar , Niamul Quader , Juwei Lu , Peng Dai , Chao Xing
发明人: Arnab Kumar Mondal , Deepak Sridhar , Niamul Quader , Juwei Lu , Peng Dai , Chao Xing
IPC分类号: G06F16/30 , G06F16/732 , G06N3/04 , G06F16/783 , G06V20/40
CPC分类号: G06F16/7343 , G06F16/783 , G06N3/04 , G06V20/40
摘要: Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.
-
公开(公告)号:US20230153352A1
公开(公告)日:2023-05-18
申请号:US17524862
申请日:2021-11-12
申请人: Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING
发明人: Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING
IPC分类号: G06F16/732 , G06F16/783 , G06K9/00 , G06N3/04
CPC分类号: G06F16/7343 , G06F16/783 , G06K9/00711 , G06N3/04
摘要: Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.
-
公开(公告)号:US20220303560A1
公开(公告)日:2022-09-22
申请号:US17203613
申请日:2021-03-16
申请人: Deepak SRIDHAR , Niamul QUADER , Srikanth MURALIDHARAN , Yaoxin LI , Juwei LU , Peng DAI
发明人: Deepak SRIDHAR , Niamul QUADER , Srikanth MURALIDHARAN , Yaoxin LI , Juwei LU , Peng DAI
摘要: Systems, methods, and computer media of processing a video are disclosed. An example method may include: receiving a plurality of video frames of a video; generating a plurality of first input features based on the plurality of video frames; generating a plurality of second input features based on reversing a temporal order of the plurality of first input features; generating a first set of joint attention features based on the plurality of first input features; generating a second set of joint attention features based on the plurality of second input features; and concatenating the first set of joint attention features and the second set of joint attention features to generate a final set of joint attention features.
-
7.
公开(公告)号:US20220114424A1
公开(公告)日:2022-04-14
申请号:US17066220
申请日:2020-10-08
申请人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
发明人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
摘要: Methods, processing units and media for multi-bandwidth separated feature extraction convolution in a neural network are described. A convolution block splits input channels of an activation map into multiple branches, each branch undergoing convolution at a different bandwidth by using down-sampling of the inputs. The outputs are concatenated by up-sampling the outputs of the low-bandwidth branches using pixel shuffling. The concatenation operation may be a shuffled concatenation operation that preserves separated multi-bandwidth feature information for use by subsequent layers of the neural network. Embodiments are described which apply frequency-based and magnitude-based attention to the weights of the convolution kernels based on the frequency band locations of the weights.
-
-
-
-
-
-