Searchable data structure for electronic documents

    公开(公告)号:US12032605B2

    公开(公告)日:2024-07-09

    申请号:US18054787

    申请日:2022-11-11

    发明人: William McNeill

    摘要: A method includes obtaining, at a device, a hierarchical structure representing a graphical layout of content items of an electronic document, the content items including at least text. The method also includes generating a word embedding representing a word of the electronic document. The method further includes determining position information of a location of the word in the electronic document. The method also includes determining a descriptor that indicates a relationship of the location to the hierarchical structure. The method further includes providing input data to a machine learning model to generate a semantic region category label of a semantic region of the electronic document. The semantic region includes the word. The input data includes the word embedding, the position information, and the descriptor.