Text feature guided visual based document classifier

    公开(公告)号:US11720605B1

    公开(公告)日:2023-08-08

    申请号:US17876069

    申请日:2022-07-28

    申请人: Intuit Inc.

    摘要: A visual-based classification model influenced by text features as a result of the outputs of a text-based classification model is disclosed. A system receives one or more documents to be classified based on one or more visual features and provides the one or more documents to a student classification model, which is a visual-based classification model. The system also classifies, by the student classification model, the one or more documents into one or more document types based on one or more visual features. The one or more visual features are generated by the student classification model that is trained based on important text identified by a teacher classification model for the one or more document types, with the teacher classification model being a text-based classification model. Generating training data and training the student classification model based on the training data are also described.

    Framework agnostic summarization of multi-channel communication

    公开(公告)号:US11989500B2

    公开(公告)日:2024-05-21

    申请号:US18453772

    申请日:2023-08-22

    申请人: INTUIT INC.

    摘要: Aspects of the present disclosure provide techniques for improved automated parsing and display of electronic documents. Embodiments include identifying a set of topics in a first electronic document based on one or more rules related to one or more keywords in the first electronic document. Embodiments include providing one or more inputs to a machine learning model based on the set of topics and a second electronic document related to the first electronic document. Embodiments include receiving, from the machine learning model in response to the one or more inputs, one or more outputs related to formatting the second electronic document for display. Embodiments include generating a formatted version of the first electronic document based on the set of topics and generating a formatted version of the second electronic document based on the one or more outputs.

    Embedding performance optimization through use of a summary model

    公开(公告)号:US12099539B2

    公开(公告)日:2024-09-24

    申请号:US17647607

    申请日:2022-01-11

    申请人: INTUIT INC.

    摘要: Aspects of the present disclosure provide techniques for improved text classification. Embodiments include providing, based on a text string, one or more first inputs to a summary model. Embodiments include determining, based on one or more first outputs from the summary model in response to the one or more first inputs, a summarized version of the text string. In some embodiments the summarized version of the text string comprises a number of tokens that is less than or equal to a maximum number of input tokens for a machine learning model. Embodiments include providing, based on the summarized version of the text string, one or more second inputs to the machine learning model. Embodiments include determining one or more attributes of the text string based on one or more second outputs received from the machine learning model in response to the one or more second inputs.

    TEXT FEATURE GUIDED VISUAL BASED DOCUMENT CLASSIFIER

    公开(公告)号:US20240037125A1

    公开(公告)日:2024-02-01

    申请号:US18211127

    申请日:2023-06-16

    申请人: Intuit Inc.

    摘要: A visual-based classification model influenced by text features as a result of the outputs of a text-based classification model is disclosed. A system receives one or more documents to be classified based on one or more visual features and provides the one or more documents to a student classification model, which is a visual-based classification model. The system also classifies, by the student classification model, the one or more documents into one or more document types based on one or more visual features. The one or more visual features are generated by the student classification model that is trained based on important text identified by a teacher classification model for the one or more document types, with the teacher classification model being a text-based classification model. Generating training data and training the student classification model based on the training data are also described.

    Framework agnostic summarization of multi-channel communication

    公开(公告)号:US11783112B1

    公开(公告)日:2023-10-10

    申请号:US17937086

    申请日:2022-09-30

    申请人: INTUIT INC.

    摘要: Aspects of the present disclosure provide techniques for improved automated parsing and display of electronic documents. Embodiments include identifying a set of topics in a first electronic document based on one or more rules related to one or more keywords in the first electronic document. Embodiments include providing one or more inputs to a machine learning model based on the set of topics and a second electronic document related to the first electronic document. Embodiments include receiving, from the machine learning model in response to the one or more inputs, one or more outputs related to formatting the second electronic document for display. Embodiments include generating a formatted version of the first electronic document based on the set of topics and generating a formatted version of the second electronic document based on the one or more outputs.