-
公开(公告)号:US20220303560A1
公开(公告)日:2022-09-22
申请号:US17203613
申请日:2021-03-16
申请人: Deepak SRIDHAR , Niamul QUADER , Srikanth MURALIDHARAN , Yaoxin LI , Juwei LU , Peng DAI
发明人: Deepak SRIDHAR , Niamul QUADER , Srikanth MURALIDHARAN , Yaoxin LI , Juwei LU , Peng DAI
摘要: Systems, methods, and computer media of processing a video are disclosed. An example method may include: receiving a plurality of video frames of a video; generating a plurality of first input features based on the plurality of video frames; generating a plurality of second input features based on reversing a temporal order of the plurality of first input features; generating a first set of joint attention features based on the plurality of first input features; generating a second set of joint attention features based on the plurality of second input features; and concatenating the first set of joint attention features and the second set of joint attention features to generate a final set of joint attention features.
-
公开(公告)号:US20230153352A1
公开(公告)日:2023-05-18
申请号:US17524862
申请日:2021-11-12
申请人: Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING
发明人: Arnab Kumar MONDAL , Deepak SRIDHAR , Niamul QUADER , Juwei LU , Pen DAI , Chao XING
IPC分类号: G06F16/732 , G06F16/783 , G06K9/00 , G06N3/04
CPC分类号: G06F16/7343 , G06F16/783 , G06K9/00711 , G06N3/04
摘要: Methods and systems are described for performing video retrieval together with video grounding. A word-based query for a video is and encoded into a query representation using a trained query encoder. One or more similar video representations are identified, from a plurality of video representations that are similar to the query representation. Each similar video representation represents a respective relevant video. A grounding is generated for each relevant video by forward propagating each respective similar video representation together with the query representation through a trained grounding module. The relevant videos or identifiers of the relevant videos are outputted together with the grounding generated for each relevant video.
-
公开(公告)号:US20210142106A1
公开(公告)日:2021-05-13
申请号:US17095257
申请日:2020-11-11
申请人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
发明人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
摘要: Methods and systems for updating the weights of a set of convolution kernels of a convolutional layer of a neural network are described. A set of convolution kernels having attention-infused weights is generated by using an attention mechanism based on characteristics of the weights. For example, a set of location-based attention multipliers is applied to weights in the set of convolution kernels, a magnitude-based attention function is applied to the weights in the set of convolution kernels, or both. An output activation map is generated using the set of convolution kernels with attention-infused weights. A loss for the neural network is computed, and the gradient is back propagated to update the attention-infused weights of the convolution kernels.
-
4.
公开(公告)号:US20220114424A1
公开(公告)日:2022-04-14
申请号:US17066220
申请日:2020-10-08
申请人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
发明人: Niamul QUADER , Md Ibrahim KHALIL , Juwei LU , Peng DAI , Wei LI
摘要: Methods, processing units and media for multi-bandwidth separated feature extraction convolution in a neural network are described. A convolution block splits input channels of an activation map into multiple branches, each branch undergoing convolution at a different bandwidth by using down-sampling of the inputs. The outputs are concatenated by up-sampling the outputs of the low-bandwidth branches using pixel shuffling. The concatenation operation may be a shuffled concatenation operation that preserves separated multi-bandwidth feature information for use by subsequent layers of the neural network. Embodiments are described which apply frequency-based and magnitude-based attention to the weights of the convolution kernels based on the frequency band locations of the weights.
-
公开(公告)号:US20240054757A1
公开(公告)日:2024-02-15
申请号:US18327384
申请日:2023-06-01
申请人: Yanhui GUO , Deepak SRIDHAR , Peng DAI , Juwei LU
发明人: Yanhui GUO , Deepak SRIDHAR , Peng DAI , Juwei LU
CPC分类号: G06V10/62 , G06V10/24 , G06V10/44 , G06V10/764 , G06V10/806 , G06V10/82
摘要: Systems and methods for temporal action localization of video data are described. A feature representation extracted from video data has a temporal dimension and a spatial dimension. The feature representation is self-aligned in the spatial dimension. Spatial multi-sampling is performed to obtain a plurality of sparse samples of the self-aligned representation along the spatial dimension, and the multi-sampled representation is fused with the self-aligned representation. Attention-based context information aggregation is applied on the fused representation to obtain a spatially refined representation. Local temporal information aggregation is applied on the self-aligned representation to obtain a temporally refined representation. Action localization is performed on a concatenation of the spatially refined representation and the temporally refined representation.
-
公开(公告)号:US20230419733A1
公开(公告)日:2023-12-28
申请号:US17846770
申请日:2022-06-22
申请人: Yannick VERDIE , Zi Hao YANG , Deepak SRIDHAR , Juwei LU
发明人: Yannick VERDIE , Zi Hao YANG , Deepak SRIDHAR , Juwei LU
CPC分类号: G06V40/28 , G06V20/64 , G06V30/1456 , G06T7/246 , G06T7/73 , G06T2207/30196
摘要: Methods and devices are described for computer vision-based gesture detection. From a frame of image data, extracted locations of keypoints of a detected hand are obtained. The extracted locations are normalized to obtain normalized features. The normalized features are processed using a trained decision tree ensemble to generate a probability of a valid gesture for the detected hand. The generated probability is compared with a defined decision threshold to generate a binary classification to classify the detected hand as a valid gesture or invalid gesture.
-
公开(公告)号:US20210279595A1
公开(公告)日:2021-09-09
申请号:US16810524
申请日:2020-03-05
申请人: Deepak SRIDHAR , Juwei LU
发明人: Deepak SRIDHAR , Juwei LU
摘要: Methods, devices and processor-readable media for an integrated teacher-student machine learning system. One or more teacher-student modules are trained as part of the teacher neural network training. Each student sub-network uses a portion of the teacher neural network to generate an intermediate feature map, then provides the intermediate feature map to a student sub-network to generate inferences. The student sub-network may use a feature enhancement block to map the intermediate feature map to a subsequent feature map. A compression block may be used to compress intermediate feature map data for transmission in some embodiments.
-
公开(公告)号:US20240193866A1
公开(公告)日:2024-06-13
申请号:US18078832
申请日:2022-12-09
申请人: Yannick VERDIE , Zihao YANG , Deepak SRIDHAR , Steven George MCDONAGH , Juwei LU
发明人: Yannick VERDIE , Zihao YANG , Deepak SRIDHAR , Steven George MCDONAGH , Juwei LU
摘要: Methods and systems for estimation of a 3D hand pose are disclosed. A 2D image containing a detected hand is processed using a U-net network to obtain a global feature vector and a heatmap for the keypoints of the hand. Information from the global feature vector and the heatmap are concatenated to obtain a set of input tokens that are processed using a transformer encoder to obtain a first set of 2D keypoints representing estimated 2D locations of the keypoints in a first view. The first set of 2D keypoints are inputted as a query to a transformer decoder, to obtain a second set of 2D keypoints representing estimated 2D locations of the keypoints in a second view. The first and second sets of 2D keypoints are aggregated to output the set of estimated 3D keypoints.
-
公开(公告)号:US20220300823A1
公开(公告)日:2022-09-22
申请号:US17204670
申请日:2021-03-17
申请人: Hanwen LIANG , Peng DAI , Qiong ZHANG , Juwei LU
发明人: Hanwen LIANG , Peng DAI , Qiong ZHANG , Juwei LU
摘要: Methods, systems, and media for training deep neural networks for cross-domain few-shot classification are described. The methods comprise an encoder and a decoder of a deep neural network. The training of the autoencoder comprises two training stages. For each iteration in the first training stage, a batch of data samples from the source dataset are sampled and fed to the encoder to generate a plurality of source feature maps, then determining a first training stage loss, which updates the autoencoder's parameters. For each iteration in the second training stage, the novel dataset is split into a support set and a query set. The support set is fed to the encoder to determine a prototype for each class label. The query set is also fed to the encoder to calculate a query set metric classification loss. The query set metric classification loss updates the autoencoder's parameters.
-
公开(公告)号:US20210191975A1
公开(公告)日:2021-06-24
申请号:US16722363
申请日:2019-12-20
申请人: Juwei LU , Sayem Mohammad SIAM , Peng DAI , Wei LI , Jin TANG
发明人: Juwei LU , Sayem Mohammad SIAM , Peng DAI , Wei LI , Jin TANG
IPC分类号: G06F16/71 , G06F3/0488 , G06F3/0482 , G06F16/783
摘要: Methods and systems for managing an image collection. Metadata associated with a captured image includes data identifying each human in the captured image. A linkage score may be generated, representing a relationship between first and second identified humans in the captured image. Records in an image collection database are updated to include the generated linkage score. The linkage information may be used to render a graphical user interface (GUI) for navigating the image collection.
-
-
-
-
-
-
-
-
-