MACHINE LEARNING BASED INFORMATION EXTRACTION

    公开(公告)号:US20240062568A1

    公开(公告)日:2024-02-22

    申请号:US17889640

    申请日:2022-08-17

    Applicant: SAP SE

    CPC classification number: G06V30/19153

    Abstract: Computer-readable media, methods, and systems are disclosed for applying machine learning mechanisms to classify and validate documents based on expense rule sets and external data validation services. Document images associated with expenses are received in connection with a reimbursable event. For each received document image data associated with the received document image is transmitted to an optical character recognition image processor that can recognize contents and associated coordinates. OCR data is received and transmitted to a text tokenizer. Tokenized text is received corresponding to expense details, and the tokenized text and coordinates are sent to a text feature generator. Text feature vectors are received and transmitted to a document classifier and a document classification received. Document fields are extracted and based thereon a document is validates and a corresponding reimbursement instruction generated.

Patent Agency Ranking