METHOD FOR IDENTIFYING NOISE SAMPLES, ELECTRONIC DEVICE, AND STORAGE MEDIUM

    公开(公告)号:US20230023789A1

    公开(公告)日:2023-01-26

    申请号:US17956558

    申请日:2022-09-29

    Abstract: The method for identifying noise samples, includes: obtaining an original sample set; obtaining a target sample set by adding masks to original training corpora in the original sample set using a preset adjustment rule; performing mask prediction on a plurality of target training corpora in the target sample set using a pre-trained language model to obtain a first mask prediction character corresponding to each target training corpus; matching the first mask prediction character corresponding to each target training corpus with a preset condition; and according to target training corpora of which first mask prediction characters do not match the preset condition in the target sample set, determining corresponding original training corpora in the original sample set as noise samples.

Patent Agency Ranking