Invention Grant
- Patent Title: Corpus cleaning method and corpus entry system
-
Application No.: US16886826Application Date: 2020-05-29
-
Publication No.: US11580299B2Publication Date: 2023-02-14
- Inventor: Li Ma , Youjun Xiong
- Applicant: UBTECH ROBOTICS CORP LTD
- Applicant Address: CN Shenzhen
- Assignee: UBTECH ROBOTICS CORP LTD
- Current Assignee: UBTECH ROBOTICS CORP LTD
- Current Assignee Address: CN Shenzhen
- Priority: CN201911379646.4 20191227
- Main IPC: G06F40/242
- IPC: G06F40/242 ; G06N20/00 ; G06F40/289 ; G06N7/00

Abstract:
The present disclosure provides a corpus cleaning method and a corpus entry system. The method includes: obtaining an input utterance; generating a predicted value of an information amount of each word in the input utterance according to the context of the input utterance using a pre-trained general model; and determining redundant words according to the predicted value of the information amount of each word, and determining whether to remove the redundant words from the input utterance. In such a manner, the objectivity and accuracy of corpus cleaning can be improved.
Public/Granted literature
- US20210200948A1 CORPUS CLEANING METHOD AND CORPUS ENTRY SYSTEM Public/Granted day:2021-07-01
Information query