Patent search ap:("Oracle International Corporation") AND inv:"Katharine D'Orazio" Page 1

1.

发明申请
LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING 有权

公开(公告)号：US20240420496A1

公开(公告)日：2024-12-19

申请号：US18210498

申请日：2023-06-15

Applicant: Oracle International Corporation

Inventor： Zheng Wang , Tao Sheng , Yazhe Hu , Mengqing Guo , Liyu Gong , Jun Qian , Katharine D'Orazio

IPC: G06V30/413 , G06V30/19 , G06V30/412 , G06V30/416

Abstract: Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.

Patent Agency Ranking