-
公开(公告)号:US20230161977A1
公开(公告)日:2023-05-25
申请号:US17535365
申请日:2021-11-24
Applicant: Beijing Youzhuju Network Technology Co. Ltd.
Inventor: Jingjing Xu , Chun Gan , Hao Zhou , Lei Li , Zaixiang Zheng
IPC: G06F40/58 , G06F40/284 , G06F40/237
CPC classification number: G06F40/58 , G06F40/284 , G06F40/237
Abstract: Implementations of the present disclosure relate to methods, devices, and computer program products for generating a destination vocabulary from a source vocabulary. In a method, a group of candidate vocabularies are determined from the source vocabulary based on a corpus, a size of a candidate vocabulary in the group of candidate vocabularies being different from a size of the source vocabulary. A group of marginal scores are obtained for the group of candidate vocabularies, respectively, a marginal score in the group of marginal scores being obtained for the candidate vocabulary based on a corpus entropy of the candidate vocabulary and a size of the candidate vocabulary. The destination vocabulary is selected from the group of candidate vocabularies based on the group of marginal scores. With these implementations, both of the corpus entropy and the vocabulary size are considered in the vocabulary generation, and thus a balance may be achieved therebetween, which may increase the performance of the generated vocabulary.
-
公开(公告)号:US12112139B2
公开(公告)日:2024-10-08
申请号:US17535365
申请日:2021-11-24
Applicant: Beijing Youzhuju Network Technology Co. Ltd.
Inventor: Jingjing Xu , Chun Gan , Hao Zhou , Lei Li , Zaixiang Zheng
IPC: G06F40/58 , G06F40/237 , G06F40/284
CPC classification number: G06F40/58 , G06F40/237 , G06F40/284
Abstract: Implementations of the present disclosure relate to methods, devices, and computer program products for generating a destination vocabulary from a source vocabulary. In a method, a group of candidate vocabularies are determined from the source vocabulary based on a corpus, a size of a candidate vocabulary in the group of candidate vocabularies being different from a size of the source vocabulary. A group of marginal scores are obtained for the group of candidate vocabularies, respectively, a marginal score in the group of marginal scores being obtained for the candidate vocabulary based on a corpus entropy of the candidate vocabulary and a size of the candidate vocabulary. The destination vocabulary is selected from the group of candidate vocabularies based on the group of marginal scores. With these implementations, both of the corpus entropy and the vocabulary size are considered in the vocabulary generation, and thus a balance may be achieved therebetween, which may increase the performance of the generated vocabulary.
-