TRAINING AND APPLYING STRUCTURED DATA EXTRACTION MODELS

    公开(公告)号:US20220327284A1

    公开(公告)日:2022-10-13

    申请号:US17849162

    申请日:2022-06-24

    摘要: A computer system for extracting structured data from unstructured or semi-structured text in an electronic document, the system comprising: a graphical user interface configured to present to a user a graphical view of a document for use in training multiple data extraction models for the document, each data extraction model associated with a user defined question; a user input component configured to enable the user to highlight portions of the document; the system configured to present in association with each highlighted portion an interactive user entry object which presents a menu of question types to a user in a manner to enable the user to select one of the question types, and a field for receiving from the user a question identifier in the form of human readable text, wherein the question identifier and question type selected by the user are used for selecting a data extraction model, and wherein the highlighted portion of the document associated with the question identifier is used to train the selected data extraction model.

    SYSTEMS AND METHODS FOR GENERATING CONTEXTUAL TABLE EMBEDDINGS FOR TABULAR DATA

    公开(公告)号:US20240242024A1

    公开(公告)日:2024-07-18

    申请号:US18179767

    申请日:2023-03-07

    摘要: Systems and methods for generating contextual table embeddings for tabular data are disclosed. In one embodiment, a method may include: receiving, by a table embedding computer program, an input table comprising a plurality of cells; separating, by the table embedding computer program, the cells in the input table by data type, wherein the data type comprises a text data type or a numeric data type; embedding, by the table embedding computer program, the data type in each cell of the input table; enhancing, by the table embedding computer program, the cells of the input table based on a position and/or the data type; generating, by the table embedding computer program, contextual embeddings for the input table using an encoder of a table transformer; and generating, by the table embedding computer program, a table summary for the contextual embeddings using a decoder for the table transformer.

    LINGUISTICALLY-DRIVEN AUTOMATED TEXT FORMATTING

    公开(公告)号:US20230351090A1

    公开(公告)日:2023-11-02

    申请号:US18346609

    申请日:2023-07-03

    摘要: Systems and techniques for linguistically-driven automated text formatting are described herein. Data representing the linguistic structure of input text may be received from Natural Language Processing (NLP) Services, including but not limited to constituents, dependencies, and coreference relationships. A text model of the input text may be built using the linguistic components and relationships. Cascade rules may be applied to the text model to generate a cascaded text data structure. Cascaded data may be displayed on a range of media, including a phone, tablet, laptop, monitor, VR/AR devices. Cascaded data may be presented in dual screen formats to promote more accurate and efficient reading comprehension, greater ease in teaching native and foreign language grammatical structures, and tools for remediation of reading-related disabilities.

    Linguistically-driven automated text formatting

    公开(公告)号:US11734491B2

    公开(公告)日:2023-08-22

    申请号:US17453763

    申请日:2021-11-05

    摘要: Systems and techniques for linguistically-driven automated text formatting are described herein. Data representing the linguistic structure of input text may be received from Natural Language Processing (NLP) Services, including but not limited to constituents, dependencies, and coreference relationships. A text model of the input text may be built using the linguistic components and relationships. Cascade rules may be applied to the text model to generate a cascaded text data structure. Cascaded data may be displayed on a range of media, including a phone, tablet, laptop, monitor, VR/AR devices. Cascaded data may be presented in dual screen formats to promote more accurate and efficient reading comprehension, greater ease in teaching native and foreign language grammatical structures, and tools for remediation of reading-related disabilities.