Corpus cleaning method and corpus entry system
Abstract:
The present disclosure provides a corpus cleaning method and a corpus entry system. The method includes: obtaining an input utterance; generating a predicted value of an information amount of each word in the input utterance according to the context of the input utterance using a pre-trained general model; and determining redundant words according to the predicted value of the information amount of each word, and determining whether to remove the redundant words from the input utterance. In such a manner, the objectivity and accuracy of corpus cleaning can be improved.
Public/Granted literature
Information query
Patent Agency Ranking
0/0