-
1.
公开(公告)号:US20230385298A1
公开(公告)日:2023-11-30
申请号:US18203096
申请日:2023-05-30
Applicant: Hank AI, Inc.
Inventor: Sergey A. Razin , Jack Neil , Samuel Hartzog , Stéphane Charette
IPC: G06F16/25 , G06F16/93 , G06V30/416 , G06V30/14
CPC classification number: G06F16/254 , G06F16/256 , G06V30/1444 , G06V30/416 , G06F16/93
Abstract: Embodiments of the innovation relate to a data extraction device, comprising a controller having a processor and memory. The controller is configured receive an unstructured data file comprising a set of documents; apply the unstructured data file to a document identification model to identify a data element identifier and an associated data element of each document of the set of documents; apply an optical character recognition engine to the identified data element identifier and associated identified data element to generate a structured data element identifier and an associated structured data element, the structured data element identifier and the associated structured data element configured as machine-identifiable characters; embed the structured data element identifier and associated structured data element as metadata with the unstructured data file; and store the unstructured data file and metadata in a database.