MACHINE LEARNING TECHNIQUES FOR IDENTIFYING LOGICAL SECTIONS IN UNSTRUCTURED DATA

    公开(公告)号:US20220156489A1

    公开(公告)日:2022-05-19

    申请号:US16951983

    申请日:2020-11-18

    Applicant: Adobe Inc.

    Abstract: Methods and systems disclosed herein relate generally to systems and methods for using machine learning techniques to generate section identifiers for one or more sections of the unstructured or unformatted text data. A document-processing application identifies, with a feature-prediction layer of a machine-learning model, a feature representation that represents a semantic structure of a text section within the unformatted and unstructured document. The document-processing application generates, with a sequence-prediction layer of the machine-learning model, a section identifier (e.g., heading, body, list) for a corresponding text section by applying the sequence-prediction layer to the feature representation and using contextual information of neighboring text sections.

Patent Agency Ranking