Systems for Generating Indications of Relationships between Electronic Documents

    公开(公告)号:US20230162518A1

    公开(公告)日:2023-05-25

    申请号:US17534744

    申请日:2021-11-24

    Applicant: Adobe Inc.

    CPC classification number: G06V30/413 G06V30/274 G06V30/414 G06V30/418

    Abstract: In implementations of systems for generating indications of relationships between electronic documents, a processing device implements a relationship system to segment text of electronic documents included in a document corpus into segments. The relationship system determines a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number. The similar segments are identified using locality sensitive hashing. The electronic document pairs are classified as related documents or unrelated documents using a machine learning model that receives a pair of electronic documents as an input and generates an indication of a classification for the pair of electronic documents as an output. Indications of relationships between particular electronic documents included in the subset are generated based at least partially on the electronic document pairs that are classified as related documents.

    EXPLOITING DOMAIN-SPECIFIC LANGUAGE CHARACTERISTICS FOR LANGUAGE MODEL PRETRAINING

    公开(公告)号:US20240303496A1

    公开(公告)日:2024-09-12

    申请号:US18181044

    申请日:2023-03-09

    Applicant: ADOBE INC.

    CPC classification number: G06N3/0895 G06F40/279

    Abstract: A method, apparatus, non-transitory computer readable medium, and system of training a domain-specific language model are described. One or more aspects of the method, apparatus, non-transitory computer readable medium, and system include obtaining domain-specific training data including a plurality of domain-specific documents having a document structure corresponding to a domain, and obtaining domain-agnostic training data including a plurality of documents outside of the domain. The domain-specific training data and the domain-agnostic training data are used to train a language model to perform a domain-specific task based on the domain-specific training data and to perform a domain agnostic task based on the domain-agnostic training data.

    Systems for generating indications of relationships between electronic documents

    公开(公告)号:US12198459B2

    公开(公告)日:2025-01-14

    申请号:US17534744

    申请日:2021-11-24

    Applicant: Adobe Inc.

    Abstract: In implementations of systems for generating indications of relationships between electronic documents, a processing device implements a relationship system to segment text of electronic documents included in a document corpus into segments. The relationship system determines a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number. The similar segments are identified using locality sensitive hashing. The electronic document pairs are classified as related documents or unrelated documents using a machine learning model that receives a pair of electronic documents as an input and generates an indication of a classification for the pair of electronic documents as an output. Indications of relationships between particular electronic documents included in the subset are generated based at least partially on the electronic document pairs that are classified as related documents.

Patent Agency Ranking