-
公开(公告)号:US20250022301A1
公开(公告)日:2025-01-16
申请号:US18772414
申请日:2024-07-15
Applicant: Google LLC
Inventor: Shangbang Long , Siyang Qin , Yasuhisa Fujii , Alessandro Bissacco , Michail Raptis
IPC: G06V30/20 , G06V30/14 , G06V30/166 , G06V30/18 , G06V30/19
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting text instances of arbitrary shapes, sizes, and locations. In one aspect, a method comprises processing an image depicting one or more text instances, generating a respective prediction for each character in a sequence of characters that are predicted to be depicted in the text instance, the respective prediction comprising a respective character class to which the predicted character belongs, the respective character class selected from a set that includes printable character classes and a space character class and a bounding box that contains the character within the image, and grouping the sequence of characters into a plurality of words based on locations of characters that are predicted to belong to the space character class.
-
公开(公告)号:US20240354504A1
公开(公告)日:2024-10-24
申请号:US18684557
申请日:2021-08-25
Applicant: Google LLC
Inventor: Chen-Yu Lee , Chun-Liang Li , Timothy Dozat , Vincent Perot , Guolong Su , Nan Hua , Joshua Ainslie , Renshen Wang , Yasuhisa Fujii , Tomas Pfister
IPC: G06F40/284 , G06V30/10 , G06V30/416
CPC classification number: G06F40/284 , G06V30/10 , G06V30/416
Abstract: Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized “supertoken” embeddings for each token, which are then fed to a transformer that employs a sparse attention paradigm in which attention weights for at least some supertokens are modified based on differences between predicted and actual values of the order and distance between the attender and attendee supertokens.
-
公开(公告)号:US20250036886A1
公开(公告)日:2025-01-30
申请号:US18766812
申请日:2024-07-09
Applicant: Google LLC
Inventor: Chen-Yu Lee , Alexander Ratner , Tomas Pfister , Chun-Liang Li , Yasuhisa Fujii , Ranjay Krishna , Cheng-Yu Hsieh , Si-An Chen
IPC: G06F40/40 , G06N3/0475
Abstract: Using a large language model to comply with a user request. The large language model receives tool documentation for each of one or more tools, and analyzes the tool documentation for each of the one or more tools to determine, for each tool, one or more tasks that the tool is operable to perform. Upon receiving a request from a user, the large language model generates a plan for complying with the request by using one or more of the tools, the plan including performance of one or more of the tasks.
-
公开(公告)号:US20240062560A1
公开(公告)日:2024-02-22
申请号:US17901617
申请日:2022-09-01
Applicant: Google LLC
Inventor: Shangbang Long , Siyang Qin , Dmitry Panteleev , Alessandro Bissacco , Yasuhisa Fujii , Michail Raptis
IPC: G06V20/62 , G06V30/414 , G06V10/82 , G06V30/14
CPC classification number: G06V20/63 , G06V30/414 , G06V10/82 , G06V30/1448
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly performing text detection and layout analysis. In one aspect, a method comprises processing the image and a set of object queries to generate an encoded representation of the image and an encoded representation of the set of object queries; processing the encoded representation of the image and the encoded representation of the set of object queries to generate a set of text detection masks; processing the encoded representation of the set of object queries to generate layout relevance measures; processing the encoded representation of the set of object queries to generate textness scores for the text detection masks; generating a text detection output that defines respective areas of the image that include text items; and generating a layout analysis output that defines clusters of respective areas of the image identified by the text detection masks.
-
-
-