-
公开(公告)号:US20230048031A1
公开(公告)日:2023-02-16
申请号:US17964165
申请日:2022-10-12
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
IPC: G06N3/08 , G06F40/279 , G06F40/103
Abstract: Relating to the field of artificial intelligence, and specifically relating to the field of natural language processing, a data processing method includes and an apparatus performs: determining original text samples, where masking processing is not performed on the original text samples; and performing mask processing on the original text samples to obtain mask training samples, where the mask processing makes mask proportions of the mask training samples unfixed, and the mask training samples each are used to train a pretrained language model PLM. Training the PLM by using the mask training samples whose mask proportions are unfixed can enhance mode diversity of the training samples of the PLM. Therefore, features learned by the PLM are also diversified, a generalization capability of the PLM can be improved, and a natural language understanding capability of the PLM obtained through training can be improved.