-
公开(公告)号:US12299082B2
公开(公告)日:2025-05-13
申请号:US18599029
申请日:2024-03-07
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Gaurav Mittal , Nikolaos Karianakis , Victor Manuel Fragoso Rojas , Mei Chen , Jedrzej Jakub Kozerawski
IPC: G06F18/2431 , G06N3/04 , G06N3/08
Abstract: A method of balancing a dataset for a machine learning model includes identifying confusing classes of few-shot classes for a machine learning model during validation. One of the confusing classes and an image from one of the few-shot classes are selected. An image perturbation is computed such that the selected image is classified as the selected confusing class. The selected image is modified with the computed perturbation. The modified selected image is added to a batch for training the machine learning model.
-
公开(公告)号:US11895343B2
公开(公告)日:2024-02-06
申请号:US17852310
申请日:2022-06-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Junwen Chen
IPC: H04N21/23 , H04N21/234 , G06V20/40 , G06T7/246
CPC classification number: H04N21/23418 , G06T7/246 , G06V20/46 , G06T2207/10021
Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
-
公开(公告)号:US12192543B2
公开(公告)日:2025-01-07
申请号:US18393664
申请日:2023-12-21
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Junwen Chen
IPC: H04N21/23 , G06T7/246 , G06V20/40 , H04N21/234
Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
-
公开(公告)号:US11544561B2
公开(公告)日:2023-01-03
申请号:US16875782
申请日:2020-05-15
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Victor Manuel Fragoso Rojas , Nikolaos Karianakis , Mei Chen , Chang Liu
Abstract: Providing a task-aware recommendation of hyperparameter configurations for a neural network architecture. First, a joint space of tasks and hyperparameter configurations are constructed using a plurality of tasks (each of which corresponds to a dataset) and a plurality of hyperparameter configurations. The joint space is used as training data to train and optimize a performance prediction network, such that for a given unseen task corresponding to one of the plurality of tasks and a given hyperparameter configuration corresponding to one of the plurality of hyperparameter configurations, the performance prediction network is configured to predict performance that is to be achieved for the unseen task using the hyperparameter configuration.
-
公开(公告)号:US12087043B2
公开(公告)日:2024-09-10
申请号:US17535517
申请日:2021-11-24
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Jay Sanjay Patravali
IPC: G06K9/00 , G06F16/73 , G06F16/75 , G06N20/00 , G06V10/764 , G06V10/774
CPC classification number: G06V10/7753 , G06F16/73 , G06F16/75 , G06N20/00 , G06V10/764 , G06V10/7747
Abstract: The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.
-
公开(公告)号:US11960574B2
公开(公告)日:2024-04-16
申请号:US17361146
申请日:2021-06-28
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Gaurav Mittal , Nikolaos Karianakis , Victor Manuel Fragoso Rojas , Mei Chen , Jedrzej Jakub Kozerawski
IPC: G06F18/2431 , G06N3/04 , G06N3/08
CPC classification number: G06F18/2431 , G06N3/04 , G06N3/08
Abstract: A method of balancing a dataset for a machine learning model includes identifying confusing classes of few-shot classes for a machine learning model during validation. One of the confusing classes and an image from one of the few-shot classes are selected. An image perturbation is computed such that the selected image is classified as the selected confusing class. The selected image is modified with the computed perturbation. The modified selected image is added to a batch for training the machine learning model.
-
公开(公告)号:US11238885B2
公开(公告)日:2022-02-01
申请号:US16173491
申请日:2018-10-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Gaurav Mittal , Baoyuan Wang
Abstract: A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
-
-
-
-
-
-