Patent search ap:("Google LLC") AND inv:"Yasuhisa Fujii" Page 1

1.

发明申请
JOINT TEXT SPOTTING AND LAYOUT ANALYSIS 有权

公开(公告)号：US20250022301A1

公开(公告)日：2025-01-16

申请号：US18772414

申请日：2024-07-15

Applicant: Google LLC

Inventor： Shangbang Long , Siyang Qin , Yasuhisa Fujii , Alessandro Bissacco , Michail Raptis

IPC: G06V30/20 , G06V30/14 , G06V30/166 , G06V30/18 , G06V30/19

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting text instances of arbitrary shapes, sizes, and locations. In one aspect, a method comprises processing an image depicting one or more text instances, generating a respective prediction for each character in a sequence of characters that are predicted to be depicted in the text instance, the respective prediction comprising a respective character class to which the predicted character belongs, the respective character class selected from a set that includes printable character classes and a space character class and a bounding box that contains the character within the image, and grouping the sequence of characters into a plurality of words based on locations of characters that are predicted to belong to the space character class.

2.

发明公开
STRUCTURAL ENCODING AND ATTENTION PARADIGMS FOR SEQUENCE MODELING 审中-公开

公开(公告)号：US20240354504A1

公开(公告)日：2024-10-24

申请号：US18684557

申请日：2021-08-25

Applicant: Google LLC

Inventor： Chen-Yu Lee , Chun-Liang Li , Timothy Dozat , Vincent Perot , Guolong Su , Nan Hua , Joshua Ainslie , Renshen Wang , Yasuhisa Fujii , Tomas Pfister

IPC: G06F40/284 , G06V30/10 , G06V30/416

CPC classification number: G06F40/284 , G06V30/10 , G06V30/416

Abstract: Systems and methods for providing a structure-aware sequence model that can interpret a document's text without first inferring the proper reading order of the document. In some examples, the model may use a graph convolutional network to generate contextualized “supertoken” embeddings for each token, which are then fed to a transformer that employs a sparse attention paradigm in which attention weights for at least some supertokens are modified based on differences between predicted and actual values of the order and distance between the attender and attendee supertokens.

3.

发明申请
Tool Documentation Enables Zero-Shot Tool-Usage With Large Language Models 有权

公开(公告)号：US20250036886A1

公开(公告)日：2025-01-30

申请号：US18766812

申请日：2024-07-09

Applicant: Google LLC

Inventor： Chen-Yu Lee , Alexander Ratner , Tomas Pfister , Chun-Liang Li , Yasuhisa Fujii , Ranjay Krishna , Cheng-Yu Hsieh , Si-An Chen

IPC: G06F40/40 , G06N3/0475

Abstract: Using a large language model to comply with a user request. The large language model receives tool documentation for each of one or more tools, and analyzes the tool documentation for each of the one or more tools to determine, for each tool, one or more tasks that the tool is operable to perform. Upon receiving a request from a user, the large language model generates a plan for complying with the request by using one or more of the tools, the plan including performance of one or more of the tasks.

4.

发明公开
UNIFIED SCENE TEXT DETECTION AND LAYOUT ANALYSIS 审中-公开

公开(公告)号：US20240062560A1

公开(公告)日：2024-02-22

申请号：US17901617

申请日：2022-09-01

Applicant: Google LLC

Inventor： Shangbang Long , Siyang Qin , Dmitry Panteleev , Alessandro Bissacco , Yasuhisa Fujii , Michail Raptis

IPC: G06V20/62 , G06V30/414 , G06V10/82 , G06V30/14

CPC classification number: G06V20/63 , G06V30/414 , G06V10/82 , G06V30/1448

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly performing text detection and layout analysis. In one aspect, a method comprises processing the image and a set of object queries to generate an encoded representation of the image and an encoded representation of the set of object queries; processing the encoded representation of the image and the encoded representation of the set of object queries to generate a set of text detection masks; processing the encoded representation of the set of object queries to generate layout relevance measures; processing the encoded representation of the set of object queries to generate textness scores for the text detection masks; generating a text detection output that defines respective areas of the image that include text items; and generating a layout analysis output that defines clusters of respective areas of the image identified by the text detection masks.

Patent Agency Ranking