Searchable data structure for electronic documents

    公开(公告)号:US12032605B2

    公开(公告)日:2024-07-09

    申请号:US18054787

    申请日:2022-11-11

    发明人: William McNeill

    摘要: A method includes obtaining, at a device, a hierarchical structure representing a graphical layout of content items of an electronic document, the content items including at least text. The method also includes generating a word embedding representing a word of the electronic document. The method further includes determining position information of a location of the word in the electronic document. The method also includes determining a descriptor that indicates a relationship of the location to the hierarchical structure. The method further includes providing input data to a machine learning model to generate a semantic region category label of a semantic region of the electronic document. The semantic region includes the word. The input data includes the word embedding, the position information, and the descriptor.

    METHOD AND SYSTEM FOR SUGGESTING REVISIONS TO AN ELECTRONIC DOCUMENT

    公开(公告)号:US20240126989A1

    公开(公告)日:2024-04-18

    申请号:US18330484

    申请日:2023-06-07

    申请人: BLACKBOILER, INC.

    摘要: A method for suggesting revisions to a document-under-analysis from a seed database, the seed database including a plurality of original texts each respectively associated with one of a plurality of final texts, the method for suggesting revisions including selecting a statement-under-analysis (“SUA”), selecting a first original text of the plurality of original texts, determining a first edit-type classification of the first original text with respect to its associated final text, generating a first similarity score for the first original text based on the first edit-type classification, the first similarity score representing a degree of similarity between the SUA and the first original text, selecting a second original text of the plurality of original texts, determining a second edit-type classification of the second original text with respect to its associated final text, generating a second similarity score for the second original text based on the second edit-type classification, the second similarity score representing a degree of similarity between the SUA and the second original text, selecting a candidate original text from one of the first original text and the second original text, and creating an edited SUA (“ESUA”) by modifying a copy of the first SUA consistent with a first candidate final text associated with the first candidate original text.

    DEVICE DEPENDENT RENDERING OF PDF CONTENT INCLUDING MULTIPLE ARTICLES AND A TABLE OF CONTENTS

    公开(公告)号:US20240104290A1

    公开(公告)日:2024-03-28

    申请号:US18374565

    申请日:2023-09-28

    申请人: ISSUU, INC.

    摘要: The technology disclosed relates to systems and methods for device-dependent display of an article from a PDF file that has multiple articles and a table of contents to the articles. The system can use a library to render the article from the PDF file. The rendering can include bounding boxes positioned at on-page coordinates that can include one or more images and multiple text blocks of glyphs. The system can detect at least one table in the PDF file that includes pages numbers and multiple columns. The system includes logic to partition a contiguous sequence of text representing the table into text blocks of entries and columns. The system includes logic to merge multiple text blocks that align horizontally with a single page number into a single text block. Table of contents is displayed in a device-dependent format including the entries from the merged text blocks.