Patent search ap:("GOOGLE LLC") AND inv:"Mostafa Dehghani" Page 1

1.

发明申请
Systems and Methods for Improved Video Understanding 有权

公开(公告)号：US20240428586A1

公开(公告)日：2024-12-26

申请号：US18827088

申请日：2024-09-06

Applicant: Google LLC

Inventor： Anurag Arnab , Mostafa Dehghani , Georg Heigold , Chen Sun , Mario Lucic , Cordelia Luise Schmid

IPC: G06V20/40 , G06N20/00

Abstract: A computer-implemented method for classifying video data with improved accuracy includes obtaining, by a computing system comprising one or more computing devices, video data comprising a plurality of video frames; extracting, by the computing system, a plurality of spatiotemporal representations from the video data, the plurality of spatiotemporal representations comprising a representation of spatiotemporal information in the video data; providing, by the computing system, the plurality of spatiotemporal representations as input to a video understanding model, the video understanding model comprising a video transformer encoder model; and receiving, by the computing system, a classification output from the video understanding model.

2.

发明公开
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES 审中-公开

公开(公告)号：US20240346824A1

公开(公告)日：2024-10-17

申请号：US18634794

申请日：2024-04-12

Applicant: Google LLC

Inventor： Alexey Alexeevich Gritsenko , Xuehan Xiong , Josip Djolonga , Mostafa Dehghani , Chen Sun , Mario Lucic , Cordelia Luise Schmid , Anurag Arnab

IPC: G06V20/40 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/77 , G06V10/774 , G06V10/776 , G06V10/82

CPC classification number: G06V20/46 , G06T7/73 , G06V10/62 , G06V10/764 , G06V10/7715 , G06V10/774 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/20081 , G06T2207/20084

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

3.

发明公开
TRAINING ULTRA-LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS 审中-公开

公开(公告)号：US20240256835A1

公开(公告)日：2024-08-01

申请号：US18424420

申请日：2024-01-26

Applicant: Google LLC

Inventor： Mostafa Dehghani , Josip Djolonga , Jonathan Heek , Basil Mustafa , Piotr Michal Padlewski , Justin Morgan Gilmer , Neil Matthew Tinmouth Houlsby

IPC: G06N3/0455 , G06N3/088

CPC classification number: G06N3/0455 , G06N3/088

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing an input through each of a plurality of layers of a neural network to generate an output using a plurality of hardware accelerators. The plurality of layers comprise a fully connected layer having a plurality of parameters arranged in a row dimension and a column dimension. One of the methods comprises: generating a plurality of parameter blocks by partitioning the plurality of parameters along the row dimension and the column dimension; determining a ratio of a number of parameters along the row dimension relative to a number of parameters along the column dimension; and determining whether to use row sharding or column sharding with the plurality of hardware accelerators to calculate an output for the fully connected layer and then calculating the output for the fully connected layer using either row sharding or column sharding.

4.

发明申请
UNIVERSAL TRANSFORMERS 审中-公开

公开(公告)号：US20190354567A1

公开(公告)日：2019-11-21

申请号：US16417587

申请日：2019-05-20

Applicant: Google LLC

Inventor： Mostafa Dehghani , Stephan Gouws , Oriol Vinyals , Jakob D. Uszkoreit , Lukasz Mieczyslaw Kaiser

IPC: G06F17/14 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a sequence to sequence model that is recurrent in depth while employing self-attention to combine information from different parts of sequences.

5.

发明公开
Neural Network Architectures with Multiple Normalization Layers for Machine Vision 审中-公开

公开(公告)号：US20240257511A1

公开(公告)日：2024-08-01

申请号：US18419170

申请日：2024-01-22

Applicant: Google LLC

Inventor： Manoj Kumar Sivaraj , Neil Matthew Tinmouth Houlsby , Mostafa Dehghani

IPC: G06V10/82 , G06V10/26

CPC classification number: G06V10/82 , G06V10/26

Abstract: One example aspect of the present disclosure is directed to a neural network for machine vision. The neural network may include a stem block that includes a set of stem layers. The neural network may additionally include a visual transformer block. The set of stem layers may include a patch layer, a first normalization layer, an embedding layer, and a second normalization layer. The patch layer subdivides an input image into a set of image patches. The first normalization layer generates a set of normalized image patches by performing a first normalization process on each image patch of the set of image patches. The patch layer feeds forward to the first normalization layer. The embedding layer generates a set of vector embeddings. Each vector embedding of the set of embedding vectors is a projection of a corresponding normalized image patch from the set of normalized image patches onto a visual token. The first normalization layer feeds forward to the embedding layer. The second normalization layer generates a set of normalized vector embeddings by performing a second normalization process on each vector embedding of the set of vector embeddings. The embedding layer feeds forward to the second normalization layer. The transformer block enables one or more machine vision tasks for the input image based on the set of normalized vectors. The second normalization layer feeds forward to the transformer block.

6.

发明公开
UNIVERSAL TRANSFORMERS 审中-公开

公开(公告)号：US20240143691A1

公开(公告)日：2024-05-02

申请号：US18544245

申请日：2023-12-18

Applicant: Google LLC

Inventor： Mostafa Dehghani , Stephan Gouws , Oriol Vinyals , Jakob D. Uszkoreit , Lukasz Mieczyslaw Kaiser

IPC: G06F17/14 , G06N3/04

CPC classification number: G06F17/14 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a sequence-to-sequence model that is recurrent in depth while employing self-attention to combine information from different parts of sequences.

7.

发明公开
Using Chains of Thought to Prompt Machine-Learned Models Pre-Trained on Diversified Objectives 审中-公开

公开(公告)号：US20230244938A1

公开(公告)日：2023-08-03

申请号：US18160776

申请日：2023-01-27

Applicant: Google LLC

Inventor： Jason Weng Wei , Dengyong Zhou , Xuezhi Wang , Dale Eric Schuurmans , Quoc V. Le , Maarten Paul Bosma , Ed Huai-Hsin Chi , Olivier Jean Andrè Bousquet , Le Hou , Charles Aloysius Sutton , Nathanael Martin Schärli , Nathan Kemp Sekiguchi Scales , Augustus Quadrozzi Odena , Sharan Ajit Narang , Guy Gur-Ari Krakover , Aakanksha Chowdhery , David Martin Dohan , Aitor Lewkowycz , Henryk Michalewski , Jiageng Luan , David J. Bieber , Jacob Austin , Anders Johan Andreassen , Maxwell Isaac Nye , Yi Tay , Mostafa Dehghani

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: An example method for pretraining a machine-learned model is provided. The example method includes obtaining a plurality of different combinations of configuration parameters of a pretraining objective framework. The example method includes generating, using the pretraining objective framework, a plurality of corrupted training examples from one or more training examples, wherein the plurality of corrupted training examples are respectively generated according to the plurality of different combinations. The example method includes inputting the plurality of corrupted training examples into the machine-learned model, wherein the machine-learned model is configured to generate uncorrupted subportions corresponding to corrupted subportions of the corrupted training examples. The example method includes obtaining, from the machine-learned model, a plurality of outputs respectively generated by the machine-learned model based on the plurality of corrupted training examples. The example method includes updating one or more parameters of the machine-learned model based on an evaluation of the plurality of outputs.

8.

发明申请
Machine-Learned Attention Models Featuring Omnidirectional Processing 有权

公开(公告)号：US20220245428A1

公开(公告)日：2022-08-04

申请号：US17592796

申请日：2022-02-04

Applicant: Google LLC

Inventor： Yi Tay , Da-Cheng Juan , Dara Bahri , Donald Arthur Metzler, JR. , Jai Prakash Gupta , Mostafa Dehghani , Phillip Pham , Vamsi Krishna Aribandi , Zhen Qin

IPC: G06N3/04 , G06N3/10

Abstract: Provided are machine-learned attention models that feature omnidirectional processing, example implementations of which can be referred to as Omnidirectional Representations from Transformers (OMNINET). In example models described in the present disclosure, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in some or all of the other tokens across the entire network.

9.

发明申请
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS 有权

公开(公告)号：US20220108478A1

公开(公告)日：2022-04-07

申请号：US17492537

申请日：2021-10-01

Applicant: Google LLC

Inventor： Neil Matthew Tinmouth Houlsby , Sylvain Gelly , Jakob D. Uszkoreit , Xiaohua Zhai , Georg Heigold , Lucas Klaus Beyer , Alexander Kolesnikov , Matthias Johannes Lorenz Minderer , Dirk Weissenborn , Mostafa Dehghani , Alexey Dosovitskiy , Thomas Unterthiner

IPC: G06T7/00 , G06N3/04 , G06K9/62 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

10.

发明申请
PROCESSING IMAGES USING SELF-ATTENTION BASED NEURAL NETWORKS 有权

公开(公告)号：US20250005798A1

公开(公告)日：2025-01-02

申请号：US18883946

申请日：2024-09-12

Applicant: Google LLC

Inventor： Neil Matthew Tinmouth Houlsby , Sylvain Gelly , Jakob D. Uszkoreit , Xiaohua Zhai , Georg Heigold , Lucas Klaus Beyer , Alexander Kolesnikov , Matthias Johannes Lorenz Minderer , Dirk Weissenborn , Mostafa Dehghani , Alexey Dosovitskiy , Thomas Unterthiner

IPC: G06T7/00 , G06F18/24 , G06N3/045 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using self-attention based neural networks. One of the methods includes obtaining one or more images comprising a plurality of pixels; determining, for each image of the one or more images, a plurality of image patches of the image, wherein each image patch comprises a different subset of the pixels of the image; processing, for each image of the one or more images, the corresponding plurality of image patches to generate an input sequence comprising a respective input element at each of a plurality of input positions, wherein a plurality of the input elements correspond to respective different image patches; and processing the input sequences using a neural network to generate a network output that characterizes the one or more images, wherein the neural network comprises one or more self-attention neural network layers.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification