Patent search ap:("Google LLC") AND inv:"Jing Yu Koh" Page 1

1.

发明申请
Cross-Modal Contrastive Learning for Text-to-Image Generation based on Machine Learning Models 有权

公开(公告)号：US20230081171A1

公开(公告)日：2023-03-16

申请号：US17467628

申请日：2021-09-07

Applicant: Google LLC

Inventor： Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee

IPC: G06T11/00 , G06K9/62 , G10L15/26 , G06N3/08

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

2.

发明公开
Cross-Modal Contrastive Learning for Text-to-Image Generation based on Machine Learning Models 审中-公开

公开(公告)号：US20240362830A1

公开(公告)日：2024-10-31

申请号：US18770154

申请日：2024-07-11

Applicant: Google LLC

Inventor： Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee

IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26

CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

3.

发明授权
Systems and methods for generating predicted visual observations of an environment using machine learned models 有权

公开(公告)号：US12014446B2

公开(公告)日：2024-06-18

申请号：US17409249

申请日：2021-08-23

Applicant: Google LLC

Inventor： Jing Yu Koh , Honglak Lee , Yinfei Yang , Jason Michael Baldridge , Peter James Anderson

IPC: G06T11/00 , G06F18/213 , G06N3/045 , G06N3/08 , G06T7/10 , G06T15/00 , G06T15/08

CPC classification number: G06T11/00 , G06F18/213 , G06N3/045 , G06N3/08 , G06T7/10 , G06T15/00 , G06T15/08 , G06T2207/10028 , G06T2207/20081

Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.

4.

发明申请
Systems And Methods For Generating Predicted Visual Observations Of An Environment Using Machine Learned Models 有权

公开(公告)号：US20230072293A1

公开(公告)日：2023-03-09

申请号：US17409249

申请日：2021-08-23

Applicant: Google LLC

Inventor： Jing Yu Koh , Honglak Lee , Yinfei Yang , Jason Michael Baldridge , Peter James Anderson

IPC: G06T11/00 , G06T15/00 , G06T15/08 , G06K9/62 , G06T7/10 , G06N3/08 , G06N3/04

Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.

5.

发明申请
Vector-Quantized Image Modeling 有权

公开(公告)号：US20240404238A1

公开(公告)日：2024-12-05

申请号：US18698997

申请日：2022-10-05

Applicant: Google LLC

Inventor： Jiahui Yu , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Yonghui Wu , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Han Zhang , Xin Li

IPC: G06V10/28 , G06F40/284 , G06V10/764 , G06V10/766 , G06V10/82

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

6.

发明授权
Cross-modal contrastive learning for text-to-image generation based on machine learning models 有权

公开(公告)号：US12067646B2

公开(公告)日：2024-08-20

申请号：US17467628

申请日：2021-09-07

Applicant: Google LLC

Inventor： Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee

IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26

CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26

Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.

7.

发明公开
Vector-Quantized Image Modeling 审中-公开

公开(公告)号：US20240112088A1

公开(公告)日：2024-04-04

申请号：US18520083

申请日：2023-11-27

Applicant: Google LLC

Inventor： Jiahui Yu , Xin Li , Han Zhang , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Yonghui Wu

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification