-
公开(公告)号:US20240394284A1
公开(公告)日:2024-11-28
申请号:US18386803
申请日:2023-11-03
Applicant: Google LLC
Inventor: Daniel Vlasic , Yiming Gu , Daniel Hernandez Diaz , Ilaï Deutel , Xi Xiong , Tianli Yu , Joseph Pagadora , Mingyang Ling , Jill Daley , Guolong Su
IPC: G06F16/332 , G06F40/40 , G06T7/11 , G06V30/414
Abstract: An aspect of the disclosed technology is a system and process that are able to answer a document query as text and also provide the location in an image where the answer text is detected. In one aspect of the disclosed technology, a machine learning model combines vision and language features for joint learning.
-
公开(公告)号:US20240153297A1
公开(公告)日:2024-05-09
申请号:US18501982
申请日:2023-11-03
Applicant: Google LLC
Inventor: Zizhao Zhang , Zifeng Wang , Vincent Perot , Jacob Devlin , Chen-Yu Lee , Guolong Su , Hao Zhang , Tomas Jon Pfister
IPC: G06V30/24 , G06F16/21 , G06V30/19 , G06V30/412
CPC classification number: G06V30/24 , G06F16/211 , G06V30/19147 , G06V30/412
Abstract: A method for extracting entities comprises obtaining a document that includes a series of textual fields that includes a plurality of entities. Each entity represents information associated with a predefined category. The method includes generating, using the document, a series of tokens representing the series of textual fields. The method includes generating an entity prompt that includes the series of tokens and one of the plurality of entities and generating a schema prompt that includes a schema associated with the document. The method includes generating a model query that includes the entity prompt and the schema prompt and determining, using an entity extraction model and the model query, a location of the one of the plurality of entities among the series of tokens. The method includes extracting, from the document, the one of the plurality of entities using the location of the one of the plurality of entities.
-
公开(公告)号:US20250068847A1
公开(公告)日:2025-02-27
申请号:US18453236
申请日:2023-08-21
Applicant: Google LLC
Inventor: Vincent Perot , Florian Luisier , Kai Kang , Ramya Sree Boppana , Jiaqi Mu , Xiaoyu Sun , Carl Elie Saroufim , Guolong Su , Hao Zhang , Nikolay Alexeevich Glushnev , Nan Hua , Yun-Hsuan Sung , Michael Yiupun Kwong
IPC: G06F40/295 , G06V30/19
Abstract: Systems and methods for performing document entity extraction are described herein. The method can include receiving an inference document and a target schema. The method can also include generating one or more document inputs from the inference document and one or more schema inputs from the target schema. The method can further include, for each combination of the document input and schema input, obtaining one or more extraction inputs by generating a respective extraction input based on the combination, providing the respective extraction input to the machine-learned model, and receiving a respective output of the machine-learned model based on the respective extraction. The method can also include validating the extracted entity data based on reference spatial locations and inference spatial locations and outputting the validated extracted entity data.
-
公开(公告)号:US20240354504A1
公开(公告)日:2024-10-24
申请号:US18684557
申请日:2021-08-25
Applicant: Google LLC
Inventor: Chen-Yu Lee , Chun-Liang Li , Timothy Dozat , Vincent Perot , Guolong Su , Nan Hua , Joshua Ainslie , Renshen Wang , Yasuhisa Fujii , Tomas Pfister
IPC: G06F40/284 , G06V30/10 , G06V30/416
CPC classification number: G06F40/284 , G06V30/10 , G06V30/416
Abstract: Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized “supertoken” embeddings for each token, which are then fed to a transformer that employs a sparse attention paradigm in which attention weights for at least some supertokens are modified based on differences between predicted and actual values of the order and distance between the attender and attendee supertokens.
-
公开(公告)号:US20230274143A1
公开(公告)日:2023-08-31
申请号:US18173985
申请日:2023-02-24
Applicant: Google LLC
Inventor: Zizhao Zhang , Zifeng Wang , Chen-Yu Lee , Ruoxi Sun , Sayna Ebrahimi , Xiaoqi Ren , Guolong Su , Vincent Perot , Tomas Pfister , Han Zhang
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: A method for rehearsal-free continual learning includes obtaining a set of training samples where training sample in the set of training samples is associated with a respective task of a plurality of different tasks. The method includes obtaining a task-invariant prompt representative of learned knowledge common to each respective task of the plurality of different tasks. The method includes, for each respective task of the plurality of different tasks, obtaining a respective task-specific prompt representative of learned knowledge specific to the respective task. The method includes, during each of one or more training iterations, for each respective training sample in the set of training samples, selecting the respective task-specific prompt representative of the respective task of the respective training sample and training a model using the task-invariant prompt and the selected respective task-specific prompt.
-
-
-
-