-
公开(公告)号:US11314819B2
公开(公告)日:2022-04-26
申请号:US16697964
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Jared Lee Katzman , Nithin Kunala , Bing Xiang , Krishnakumar Rajagopalan , Andrew M. Grant
IPC: G06F16/93 , G06F9/54 , G06F16/31 , G06F16/951
Abstract: Techniques for intaking one or more documents are described. An exemplary method includes receiving an ingestion request to ingest a document; extracting text from the document; pre-processing the extracted text to generate pre-processed text that is predictable and analyzable; generating an index entry for the extracted text, the index entry to map the extracted text to a reserved field of a plurality of reserved fields; and storing the extracted text, index entry, and pre-processed text in at least one data storage location.