Using deep learning techniques to determine the contextual reading order in a form document

    公开(公告)号:US10423828B2

    公开(公告)日:2019-09-24

    申请号:US15843953

    申请日:2017-12-15

    Applicant: Adobe Inc.

    Abstract: Techniques for determining reading order in a document. A current labeled text run (R1), RIGHT text run (R1) and DOWN text run (R3) are generated. The R1 labeled text run is processed by a first LSTM, the R2 labeled text run is processed by a second LSTM, and the R3 labeled text run is processed by a third LSTM, wherein each of the LSTMs generates a respective internal representation (R1′, R2′ and R3′). Deep learning tools other than LSTMs can be used, as will be appreciated. The respective internal representations R1′, R2′ and R3′ are concatenated or otherwise combined into a vector or tensor representation and provided to a classifier network that generates a predicted label for a next text run as RIGHT, DOWN or EOS in the reading order of the document.

    Caption Association Techniques
    3.
    发明申请

    公开(公告)号:US20190286691A1

    公开(公告)日:2019-09-19

    申请号:US15925059

    申请日:2018-03-19

    Applicant: Adobe Inc.

    Abstract: Caption association techniques as part of digital content creation by a computing device are described. The computing device is configured to extract text features and bounding boxes from an input document. These text features and bounding boxes are processed to reduce a number of possible search spaces. The processing may involve generating and utilizing a language model that captures the semantic meaning of the text features to identify and filter static text, and may involve identifying and filtering inline captions. A number of bounding boxes are identified for a potential caption. The potential caption and corresponding identified bounding boxes are concatenated into a vector. The concatenated vector is used to identify relationships among the bounding boxes to determine a single bounding box associated with the caption. The determined association is utilized to generate an output digital document that includes a structured association between the caption and a data entry field.

    Caption association techniques
    4.
    发明授权

    公开(公告)号:US10915701B2

    公开(公告)日:2021-02-09

    申请号:US15925059

    申请日:2018-03-19

    Applicant: Adobe Inc.

    Abstract: Caption association techniques as part of digital content creation by a computing device are described. The computing device is configured to extract text features and bounding boxes from an input document. These text features and bounding boxes are processed to reduce a number of possible search spaces. The processing may involve generating and utilizing a language model that captures the semantic meaning of the text features to identify and filter static text, and may involve identifying and filtering inline captions. A number of bounding boxes are identified for a potential caption. The potential caption and corresponding identified bounding boxes are concatenated into a vector. The concatenated vector is used to identify relationships among the bounding boxes to determine a single bounding box associated with the caption. The determined association is utilized to generate an output digital document that includes a structured association between the caption and a data entry field.

Patent Agency Ranking