LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING

    公开(公告)号:US20240420496A1

    公开(公告)日:2024-12-19

    申请号:US18210498

    申请日:2023-06-15

    Abstract: Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.

Patent Agency Ranking