-
公开(公告)号:US20240046686A1
公开(公告)日:2024-02-08
申请号:US17817058
申请日:2022-08-03
Applicant: Google LLC
Inventor: Tianjun Ye , Younghwan Jung , Xiaoqi Ren , Wael Farhan , Tianjun Fu , Nikolaos Kofinas , Nikolay Alexeevich Glushnev , Matthew Eastberg Persons , Xiao Liu , Evan S. Huang , Emmanouil Koukoumidis , Bhavishya Mittal
IPC: G06V30/418 , G06V30/19 , G06V30/412 , G06V30/414 , G06V30/18
CPC classification number: G06V30/418 , G06V30/19107 , G06V30/412 , G06V30/19147 , G06V30/1918 , G06V30/414 , G06V30/18152
Abstract: A method for document extraction includes receiving, from a user device associated with a user, an annotated document that includes one or more fields. Each respective field of the one or more fields of the annotated document is labeled by a respective annotation. The method includes clustering, using a template matching algorithm, the annotated document into a cluster and inducing, using the annotated document, a document template for the cluster. The method includes receiving, from the user device, an unannotated document including the one or more fields. The method includes clustering, using the template matching algorithm, the unannotated document into the cluster and, in response to clustering the unannotated document into the cluster, extracting, using the document template, the one or more fields.