Patent search ap:("Google LLC") AND inv:"Zirui Wang" Page 1

1.

发明申请
VIDEO-TEXT MODELING WITH ZERO-SHOT TRANSFER FROM CONTRASTIVE CAPTIONERS 有权

公开(公告)号：US20250124708A1

公开(公告)日：2025-04-17

申请号：US18694604

申请日：2023-12-08

Applicant: Google LLC

Inventor： Shen Yan , Tao Zhu , Zirui Wang , Yuan Cao , Jiahui Yu

IPC: G06V20/40 , G06F16/583

Abstract: Provided is an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. Some example implementations include a model which can be referred to as VideoCoCa. Example implementations reuse a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with little or minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, aspects of the present disclosure leverage findings that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to “flattened frame embeddings”, yielding a strong zero-shot transfer baseline for many video-text tasks.

2.

发明公开
GENERATING LABELED TRAINING DATA USING A PRE-TRAINED LANGUAGE MODEL NEURAL NETWORK 审中-公开

公开(公告)号：US20230196105A1

公开(公告)日：2023-06-22

申请号：US18082934

申请日：2022-12-16

Applicant: Google LLC

Inventor： Zirui Wang , Wei Yu , Orhan Firat , Yuan Cao

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating labeled training data using a pre-trained language model neural network. In particular, the language model neural network can generate the text input in a new labeled training example from an input sequence that includes (i) one or more context inputs and (ii) a text label that identifies the ground truth category for the new labeled training example.

3.

发明公开
CONTRASTIVE CAPTIONING NEURAL NETWORKS 审中-公开

公开(公告)号：US20230351149A1

公开(公告)日：2023-11-02

申请号：US18141340

申请日：2023-04-28

Applicant: Google LLC

Inventor： Jiahui Yu , Zirui Wang , Vijay Vasudevan , Ho Man Yeung , Seyed Mojtaba Seyedhosseini Tarzjani , Yonghui Wu

IPC: G06N3/04

CPC classification number: G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing multi-modal inputs using contrastive captioning neural networks.

4.

发明公开
Systems and Methods for Pretraining Image Processing Models 审中-公开

公开(公告)号：US20230281400A1

公开(公告)日：2023-09-07

申请号：US17685774

申请日：2022-03-03

Applicant: Google LLC

Inventor： Zirui Wang , Jiahui Yu , Yuan Cao , Wei Yu , Zihang Dai

IPC: G06F40/58 , G06F40/284 , G06V30/10 , G06V10/766

CPC classification number: G06F40/58 , G06F40/284 , G06V10/766 , G06V30/10

Abstract: Example embodiments of the present disclosure relate to systems and methods for pretraining image-processing models on weakly-supervised image-text pairs. The pretraining can include receiving a training sequence for the machine-learned image-processing model. The training sequence can include text tokens and image tokens. A prefix sequence can contain the image tokens. A remainder sequence can include a remainder set of the text tokens. The pretraining can include determining, using the prefix sequence as an input to the machine-learned image-processing model, an objective based on recovery of the remainder sequence. The pretraining can include updating one or more learnable parameters of the machine-learned image-processing model based on the objective.

5.

发明申请
Vector-Quantized Image Modeling 有权

公开(公告)号：US20240404238A1

公开(公告)日：2024-12-05

申请号：US18698997

申请日：2022-10-05

Applicant: Google LLC

Inventor： Jiahui Yu , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Yonghui Wu , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Han Zhang , Xin Li

IPC: G06V10/28 , G06F40/284 , G06V10/764 , G06V10/766 , G06V10/82

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

6.

发明公开
Vector-Quantized Image Modeling 审中-公开

公开(公告)号：US20240112088A1

公开(公告)日：2024-04-04

申请号：US18520083

申请日：2023-11-27

Applicant: Google LLC

Inventor： Jiahui Yu , Xin Li , Han Zhang , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Yonghui Wu

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

Patent Agency Ranking