-
公开(公告)号:US20180032897A1
公开(公告)日:2018-02-01
申请号:US15219401
申请日:2016-07-26
Applicant: International Business Machines Corporation
Inventor: Feng Cao , Boliang Chen , Zheng Yu
Abstract: Embedding representation for a document is generated based on clustering words in the document. Representative clusters are selected and weighted sum of the embeddings of the words in the selected clusters is determined as a document embedding. Documents are labeled based on document embeddings. A machine learning algorithm is trained using the documents. The machine learning algorithm predicts a label of a given document based on the given document's document embedding.
-
公开(公告)号:US10762439B2
公开(公告)日:2020-09-01
申请号:US15219401
申请日:2016-07-26
Applicant: International Business Machines Corporation
Inventor: Feng Cao , Boliang Chen , Zheng Yu
Abstract: Embedding representation for a document is generated based on clustering words in the document. Representative clusters are selected and a weighted sum of the embeddings of the words in the selected clusters is determined as a document embedding. Documents are labeled based on document embeddings. A machine learning algorithm is trained using the documents. The machine learning algorithm predicts a label of a given document based on the given document's document embedding.
-