Systems and Methods for Improved Video Understanding

    公开(公告)号:US20240428587A1

    公开(公告)日:2024-12-26

    申请号:US18827133

    申请日:2024-09-06

    Applicant: Google LLC

    Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

    Systems and methods for improved video understanding

    公开(公告)号:US12112538B2

    公开(公告)日:2024-10-08

    申请号:US17370522

    申请日:2021-07-08

    Applicant: Google LLC

    CPC classification number: G06V20/41 G06N20/00 G06V20/46 G06V20/49

    Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

    Systems And Methods For Improved Video Understanding

    公开(公告)号:US20230017072A1

    公开(公告)日:2023-01-19

    申请号:US17370522

    申请日:2021-07-08

    Applicant: Google LLC

    Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of video tokens from the video data, the plurality of video tokens comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of video tokens as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

    Pretraining Already-Pretrained Models for Diverse Downstream Tasks

    公开(公告)号:US20240256964A1

    公开(公告)日:2024-08-01

    申请号:US18424031

    申请日:2024-01-26

    Applicant: Google LLC

    CPC classification number: G06N20/00 G06F7/483

    Abstract: An example method includes obtaining a pretrained machine-learned model that was initially pretrained using a pretraining dataset and further pretraining the model by generating, using a pretraining objective framework, a plurality of corrupted training examples from one or more training examples obtained from the pretraining dataset. A first set of one or more training examples can be corrupted according to a first set of configuration parameters of the pretraining objective framework. A second set can be corrupted according to a second set of configuration parameters of the pretraining objective framework. The example method includes inputting the plurality of corrupted training examples into model; obtaining from the model, a plurality of outputs respectively generated by model based on the plurality of corrupted training examples; and updating one or more parameters of model based on an evaluation of the plurality of outputs.

    Machine-Learned Attention Models Featuring Echo-Attention Layers

    公开(公告)号:US20220245432A1

    公开(公告)日:2022-08-04

    申请号:US17592174

    申请日:2022-02-03

    Applicant: Google LLC

    Abstract: The present disclosure provides echo-attention layers, a new efficient method for increasing the expressiveness of self-attention layers without incurring significant parameter or training time costs. One intuition behind the proposed method is to learn to echo, i.e., attend once and then get N echo-ed attentions for free (or at a relatively cheap cost). As compared to stacking new layers, the proposed echoed attentions are targeted at providing similar representation power at a better cost efficiency.

Patent Agency Ranking