-
公开(公告)号:US20240257550A1
公开(公告)日:2024-08-01
申请号:US18686233
申请日:2022-08-25
Applicant: Google LLC
Inventor: Henri Rebecq , Federico Tombari , Diego Martin Arroyo
IPC: G06V30/416 , G06V10/44 , G06V10/82 , G06V30/412
CPC classification number: G06V30/416 , G06V10/44 , G06V10/82 , G06V30/412
Abstract: A method including receiving an image representing a document including a plurality of layout components, identifying textual information associated with the plurality of layout components, identifying visual information associated with the plurality of layout components, combining the textual information with the visual information, and predicting a reading order of the plurality of layout components based on the combined textual information and visual information using a self-attention encoder/decoder.
-
公开(公告)号:US20220292717A1
公开(公告)日:2022-09-15
申请号:US17636250
申请日:2019-09-13
Applicant: Google LLC
Inventor: David Joseph Tan , Federico Tombari
Abstract: Example embodiments allow for fast, efficient detection and pose estimation of objects based on point clouds, depth images/maps, or other depth information about a scene that may contain the objects. Embodiments include translating and rotating the depth image to bring individual points of the depth image to a standard orientation and location so as to improve performance when an object is near the periphery of the field of view. Some disclosed embodiments include applying a random forest to perform pose estimation. By using the decision trees or other fast methods, it can be advantageous to perform pose estimation a plurality of times prior to identifying whether a particular object is actually present in a scene. Prospective pose estimates can be combined with models of the objects in order to evaluate whether the object is present in the scene.
-
公开(公告)号:US20210358095A1
公开(公告)日:2021-11-18
申请号:US17052049
申请日:2020-02-05
Applicant: Google LLC
Inventor: Diego Martin Arroyo , Federico Tombari , Alessio Tonioni
Abstract: A computer-implemented method to perform image-to-image translation. The method can include obtaining one or more machine-learned generator models. The one or more machine-learned generator models can be configured to receive an input image and a user-specified conditioning vector that parameterizes one or more desired values for one or more defined characteristics of an output image. The one or more machine-learned generator models can be configured to perform, based at least in part on the user-specified conditioning vector, one or more transformations on the input image to generate the output image with the one or more desired values for the one or more defined characteristics. The method can include receiving the input image and the user-specified conditioning vector. The method can include generating, using the machine-learned generator model, an output image having the one or more desired values for the one or more characteristics.
-
公开(公告)号:US12236639B2
公开(公告)日:2025-02-25
申请号:US17636250
申请日:2019-09-13
Applicant: Google LLC
Inventor: David Joseph Tan , Federico Tombari
Abstract: Example embodiments allow for fast, efficient detection and pose estimation of objects based on point clouds, depth images/maps, or other depth information about a scene that may contain the objects. Embodiments include translating and rotating the depth image to bring individual points of the depth image to a standard orientation and location so as to improve performance when an object is near the periphery of the field of view. Some disclosed embodiments include applying a random forest to perform pose estimation. By using the decision trees or other fast methods, it can be advantageous to perform pose estimation a plurality of times prior to identifying whether a particular object is actually present in a scene. Prospective pose estimates can be combined with models of the objects in order to evaluate whether the object is present in the scene.
-
公开(公告)号:US20230122207A1
公开(公告)日:2023-04-20
申请号:US17909545
申请日:2021-03-05
Applicant: Google LLC
Inventor: Mattia Segù , Federico Tombari , Alessio Tonioni
IPC: G06N3/045 , G06N3/0464 , G06N3/048 , G06N3/08
Abstract: Generally, the present disclosure is directed to systems and methods that leverage batch normalization statistics as a way to generalize across domains In particular, example implementations of the present disclosure can generate different representations for different domains by collecting independent batch normalization statistics, which can then be used to map between domains in a shared latent space. At test or inference time, samples from an unknown test or target domain can be projected into the same shared latent space. The domain of the target sample can therefore be expressed as a linear combination of the known ones, with the combination between weighted based on respective distances between batch normalization statistics in the latent space. This same mapping strategy can be applied at both training and test time to learn both a latent representation and a powerful but light-weight ensemble model that operates within such latent space.
-
公开(公告)号:US11599980B2
公开(公告)日:2023-03-07
申请号:US17052049
申请日:2020-02-05
Applicant: Google LLC
Inventor: Diego Martin Arroyo , Federico Tombari , Alessio Tonioni
Abstract: A computer-implemented method to perform image-to-image translation. The method can include obtaining one or more machine-learned generator models. The one or more machine-learned generator models can be configured to receive an input image and a user-specified conditioning vector that parameterizes one or more desired values for one or more defined characteristics of an output image. The one or more machine-learned generator models can be configured to perform, based at least in part on the user-specified conditioning vector, one or more transformations on the input image to generate the output image with the one or more desired values for the one or more defined characteristics. The method can include receiving the input image and the user-specified conditioning vector. The method can include generating, using the machine-learned generator model, an output image having the one or more desired values for the one or more characteristics.
-
公开(公告)号:US20240202878A1
公开(公告)日:2024-06-20
申请号:US18390566
申请日:2023-12-20
Applicant: Google LLC
Inventor: Diego Martin Arroyo , Alessio Tonioni , Federico Tombari
CPC classification number: G06T5/50 , G06N3/08 , G06T2207/20081 , G06T2207/20084 , G06T2207/20092
Abstract: 1. A computer-implemented method to perform image-to-image translation. The method can include obtaining one or more machine-learned generator models. The one or more machine-learned generator models can be configured to receive an input image and a user-specified conditioning vector that parameterizes one or more desired values for one or more defined characteristics of an output image. The one or more machine-learned generator models can be configured to perform, based at least in part on the user-specified conditioning vector, one or more transformations on the input image to generate the output image with the one or more desired values for the one or more defined characteristics. The method can include receiving the input image and the user-specified conditioning vector. The method can include generating, using the machine-learned generator model, an output image having the one or more desired values for the one or more characteristics.
-
公开(公告)号:US20240152546A1
公开(公告)日:2024-05-09
申请号:US18502688
申请日:2023-11-06
Applicant: Google LLC
Inventor: David Trotter Oleson , Sofie Hauge Katan , Nils Grimsmo , Mailys Claire Gabrielle Robin , Federico Tombari
IPC: G06F16/532 , G06F16/953
CPC classification number: G06F16/532 , G06F16/953
Abstract: Methods and systems for returning search results based on diagrams as search inputs are disclosed herein. One method can include receiving a search request from a user, the search request including an image that depicts a diagram with at least one associated question, and processing the search request using a diagram parsing model to obtain a formal language representation of the diagram. The method can also include providing the formal language representation of the diagram to a search engine as a search query, and receiving, as a search result to the search query, at least one solution to the at least one associated question of the diagram.
-
公开(公告)号:US20250166379A1
公开(公告)日:2025-05-22
申请号:US18949777
申请日:2024-11-15
Applicant: Google LLC
Inventor: Alessio Tonioni , Bruno Korbar , Federico Tombari , Andrew Zisserman , Yongqin Xian
Abstract: Methods, systems, and apparatus for video understanding. In one aspect, a conditioned resampler model receives video features of multiple video frames of a video processed by a visual encoder and token embeddings for a specified task. The conditioned resampler model generates conditioned resampler embeddings according to the specified task in response to the video features and token embeddings provided as input. The conditioned resampler embeddings are provided to a large language model as input. The large language model generates, in response to the input conditioned resampler embeddings, a text response to the specified task.
-
公开(公告)号:US11908115B2
公开(公告)日:2024-02-20
申请号:US18161415
申请日:2023-01-30
Applicant: Google LLC
Inventor: Diego Martin Arroyo , Alessio Tonioni , Federico Tombari
CPC classification number: G06T5/50 , G06N3/08 , G06T2207/20081 , G06T2207/20084 , G06T2207/20092
Abstract: A computer-implemented method to perform image-to-image translation. The method can include obtaining one or more machine-learned generator models. The one or more machine-learned generator models can be configured to receive an input image and a user-specified conditioning vector that parameterizes one or more desired values for one or more defined characteristics of an output image. The one or more machine-learned generator models can be configured to perform, based at least in part on the user-specified conditioning vector, one or more transformations on the input image to generate the output image with the one or more desired values for the one or more defined characteristics. The method can include receiving the input image and the user-specified conditioning vector. The method can include generating, using the machine-learned generator model, an output image having the one or more desired values for the one or more characteristics.
-
-
-
-
-
-
-
-
-