-
公开(公告)号:US20230162518A1
公开(公告)日:2023-05-25
申请号:US17534744
申请日:2021-11-24
Applicant: Adobe Inc.
Inventor: Natwar Modani , Vaidehi Ramesh Patil , Inderjeet Jayakumar Nair , Gaurav Verma , Anurag Maurya , Anirudh Kanfade
IPC: G06V30/413 , G06V30/262 , G06V30/414 , G06V30/418
CPC classification number: G06V30/413 , G06V30/274 , G06V30/414 , G06V30/418
Abstract: In implementations of systems for generating indications of relationships between electronic documents, a processing device implements a relationship system to segment text of electronic documents included in a document corpus into segments. The relationship system determines a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number. The similar segments are identified using locality sensitive hashing. The electronic document pairs are classified as related documents or unrelated documents using a machine learning model that receives a pair of electronic documents as an input and generates an indication of a classification for the pair of electronic documents as an output. Indications of relationships between particular electronic documents included in the subset are generated based at least partially on the electronic document pairs that are classified as related documents.
-
公开(公告)号:US20240303496A1
公开(公告)日:2024-09-12
申请号:US18181044
申请日:2023-03-09
Applicant: ADOBE INC.
Inventor: Inderjeet Jayakumar Nair , Natwar Modani
IPC: G06N3/0895 , G06F40/279
CPC classification number: G06N3/0895 , G06F40/279
Abstract: A method, apparatus, non-transitory computer readable medium, and system of training a domain-specific language model are described. One or more aspects of the method, apparatus, non-transitory computer readable medium, and system include obtaining domain-specific training data including a plurality of domain-specific documents having a document structure corresponding to a domain, and obtaining domain-agnostic training data including a plurality of documents outside of the domain. The domain-specific training data and the domain-agnostic training data are used to train a language model to perform a domain-specific task based on the domain-specific training data and to perform a domain agnostic task based on the domain-agnostic training data.
-
公开(公告)号:US12198459B2
公开(公告)日:2025-01-14
申请号:US17534744
申请日:2021-11-24
Applicant: Adobe Inc.
Inventor: Natwar Modani , Vaidehi Ramesh Patil , Inderjeet Jayakumar Nair , Gaurav Verma , Anurag Maurya , Anirudh Kanfade
IPC: G06K9/34 , G06V30/19 , G06V30/262 , G06V30/413 , G06V30/414 , G06V30/418
Abstract: In implementations of systems for generating indications of relationships between electronic documents, a processing device implements a relationship system to segment text of electronic documents included in a document corpus into segments. The relationship system determines a subset of the electronic documents that includes electronic document pairs having a number of similar segments that is greater than a threshold number. The similar segments are identified using locality sensitive hashing. The electronic document pairs are classified as related documents or unrelated documents using a machine learning model that receives a pair of electronic documents as an input and generates an indication of a classification for the pair of electronic documents as an output. Indications of relationships between particular electronic documents included in the subset are generated based at least partially on the electronic document pairs that are classified as related documents.
-
公开(公告)号:US20230186667A1
公开(公告)日:2023-06-15
申请号:US17549270
申请日:2021-12-13
Applicant: ADOBE INC.
Inventor: Navita Goyal , Ani Nenkova Nenkova , Natwar Modani , Ayush Maheshwari , Inderjeet Jayakumar Nair
IPC: G06V30/413 , G06V30/416 , G06V10/26 , G06V10/74 , G06N20/00
CPC classification number: G06V30/413 , G06N20/00 , G06V10/26 , G06V10/761 , G06V30/416
Abstract: Techniques described herein are directed to assisting review of documents. In one embodiment, one or more text segments and one or more subjects in a document are identified. A text segment in the document is associated with a corresponding subject identified in the document. The text segment is classified with a content type value corresponding to a relation of the text segment to the corresponding subject. Thereafter, information is provided for the text segment associated with the corresponding subject for display on a user interface. Such information can include a representation of the content type value for the text segment.
-
-
-