-
公开(公告)号:US20240420496A1
公开(公告)日:2024-12-19
申请号:US18210498
申请日:2023-06-15
Applicant: Oracle International Corporation
Inventor: Zheng Wang , Tao Sheng , Yazhe Hu , Mengqing Guo , Liyu Gong , Jun Qian , Katharine D'Orazio
IPC: G06V30/413 , G06V30/19 , G06V30/412 , G06V30/416
Abstract: Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.