-
1.
公开(公告)号:US12067646B2
公开(公告)日:2024-08-20
申请号:US17467628
申请日:2021-09-07
Applicant: Google LLC
Inventor: Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee
IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26
CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26
Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.
-
2.
公开(公告)号:US20240362830A1
公开(公告)日:2024-10-31
申请号:US18770154
申请日:2024-07-11
Applicant: Google LLC
Inventor: Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee
IPC: G06T11/00 , G06F18/214 , G06F18/22 , G06N3/08 , G10L15/26
CPC classification number: G06T11/00 , G06F18/2148 , G06F18/22 , G06N3/08 , G10L15/26
Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.
-
公开(公告)号:US20240370487A1
公开(公告)日:2024-11-07
申请号:US18253859
申请日:2022-11-04
Applicant: Google LLC
Inventor: Severin Heiniger , Balint Miklos , Yun-Hsuan Sung , Zhen Li , Yinfei Yang , Chao Jia
IPC: G06F16/538 , G06F16/55 , G06N3/084
Abstract: Systems and methods of the present disclosure are directed to computer-implemented method for machine-learned multimodal search refinement. The method includes obtaining a query image embedding for a query image and a textual query refinement associated with the query image. The method includes processing the query image embedding and the textual query refinement with a machine-learned query refinement model to obtain a refined query image embedding that incorporates the textual query refinement. The method includes evaluating a loss function that evaluates a distance between the refined query image embedding and an embedding for a ground truth image within an image embedding space. The method includes modifying value(s) of parameter(s) of the machine-learned query refinement model based on the loss function.
-
4.
公开(公告)号:US20230081171A1
公开(公告)日:2023-03-16
申请号:US17467628
申请日:2021-09-07
Applicant: Google LLC
Inventor: Han Zhang , Jing Yu Koh , Jason Michael Baldridge , Yinfei Yang , Honglak Lee
Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.
-
5.
公开(公告)号:US20220198144A1
公开(公告)日:2022-06-23
申请号:US17127734
申请日:2020-12-18
Applicant: Google LLC
Inventor: Yinfei Yang , Ziyi Yang , Daniel Matthew Cer
IPC: G06F40/284 , G06N20/00 , G06N3/04
Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual NLI fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.
-
公开(公告)号:US12014446B2
公开(公告)日:2024-06-18
申请号:US17409249
申请日:2021-08-23
Applicant: Google LLC
Inventor: Jing Yu Koh , Honglak Lee , Yinfei Yang , Jason Michael Baldridge , Peter James Anderson
CPC classification number: G06T11/00 , G06F18/213 , G06N3/045 , G06N3/08 , G06T7/10 , G06T15/00 , G06T15/08 , G06T2207/10028 , G06T2207/20081
Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.
-
7.
公开(公告)号:US11769011B2
公开(公告)日:2023-09-26
申请号:US17127734
申请日:2020-12-18
Applicant: Google LLC
Inventor: Yinfei Yang , Ziyi Yang , Daniel Matthew Cer
IPC: G06F40/284 , G06N3/04 , G06N20/00
CPC classification number: G06F40/284 , G06N3/04 , G06N20/00
Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual natural language inference (NL) fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.
-
公开(公告)号:US20230072293A1
公开(公告)日:2023-03-09
申请号:US17409249
申请日:2021-08-23
Applicant: Google LLC
Inventor: Jing Yu Koh , Honglak Lee , Yinfei Yang , Jason Michael Baldridge , Peter James Anderson
Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.
-
-
-
-
-
-
-