-
公开(公告)号:US11868715B2
公开(公告)日:2024-01-09
申请号:US17140360
申请日:2021-01-04
Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
Inventor: Dnyanesh G. Rajpathak , Ravi S. Sambangi , Xinli Wang
Abstract: A system processes unstructured data to identify a plurality of subsets of text in a set of text in the unstructured data and determines, for a subset from the plurality of subsets, probabilities based on a position of the subset in the set of text, a part of speech (POS) of each word in the subset, and POSs of one or more words on left and right hand sides of the subset, a number of the one or more words being selected based on a length of the set of text. The system generates a feature vector for the subset, the feature vector including the probabilities and additional features of the subset; and classifies, using a classifier, the subset into one of a plurality of classes based on the feature vector for the subset, the plurality of classes representing an ontology of a domain of knowledge.