-
公开(公告)号:US20230351212A1
公开(公告)日:2023-11-02
申请号:US17837233
申请日:2022-06-10
Applicant: ZHEJIANG LAB
Inventor: Hongsheng WANG , Qing LIAO , Hujun BAO , Guang CHEN
IPC: G06N5/02
CPC classification number: G06N5/022
Abstract: The disclosure provides a semi-supervised method and apparatus for public opinion text analysis. The semi-supervised method includes: first acquiring a public opinion data set, and preprocessing the data set; performing a data augmentation algorithm on preprocessed samples to generate data augmented samples; generating category labels for the unlabeled samples in the data set in an unsupervised extraction and clustering manner; calculating similarities of word vector latent semantic spaces and performing linear interpolation operation to generate, according to an operation result, similarity interpolation samples; constructing a final training sample set; adopting a semi-supervised method, inputting the final training sample set into a pre-trained language model to train the model to obtain a classification model; and predicting the test set by using the classification model to obtain a classification result.