基于历史数据及机器学习自适应获取词库领域的方法及系统

发明授权

CN108255956B 基于历史数据及机器学习自适应获取词库领域的方法及系统有权质押

请登陆查看更多内容

专利标题： 基于历史数据及机器学习自适应获取词库领域的方法及系统
申请号： CN201711391038.6

申请日： 2017-12-21
公开(公告)号： CN108255956B

公开(公告)日： 2020-04-03
发明人: 蔡劲松 , 苏少炜 , 陈孝良 , 冯大航 , 常乐
申请人： 北京声智科技有限公司
申请人地址： 北京市海淀区北三环西路25号27号楼二层2022室
专利权人： 北京声智科技有限公司
当前专利权人： 北京声智科技有限公司
当前专利权人地址： 北京市海淀区北三环西路25号27号楼二层2022室
代理机构： 中科专利商标代理有限责任公司
代理商 任岩
主分类号： G06F16/332
IPC分类号： G06F16/332 ; G06F40/211 ; G06F40/30 ; G10L15/08

摘要：

本公开提供了一种基于历史数据及机器学习自适应获取语音词库领域的方法，包括：步骤S1，对语音识别结果进行语义平面的句模分类，找到语音指令中的动核以及与其相关的动元；步骤S2，摘取出语音指令中的动元，结合机器学习及用户历史数据，选择出数个词库；步骤S3，在选择的词库中用自然语言处理中的方法进行句法平面的分词，综合多个词库领域的结果进行评估，求取评估分值最高的领域作为最优结果，输出所述最优结果，同时更新用户历史数据；步骤S4，将最优结果结合语用平面的句类分析，确定最终的词库领域。通过用户历史词库的使用情况结合机器学习，自适应地从用户的历史数据中获取对应的领域，从而大大增加了灵活性以及准确性。

摘要（英）：

The disclosure provides a method of acquiring a speech word library in a self-adaptive manner on the basis of history data and machine learning. The method includes: step S1, carrying out sentence pattern classification of a semantic plane on a speech recognition result to find a verb core in a speech instruction and verb elements related thereto; step S2, extracting the verb elements in the speech instruction, and combining machine learning and the user history data to select a plurality of word libraries; step S3, using a method in natural-language processing to carry out word segmentation of a syntax plane in the selected word libraries, synthesizing results of multiple word library fields for evaluation, obtaining a field with a highest evaluation score by solving to use the same as anoptimal result, outputting the optimal result, and updating the user history data at the same time; and step S4, combining the optimal result with sentence type analysis of a pragmatic plane to determine a final word library field. Through combining use situations of the user history word libraries with machine learning, the corresponding field is acquired in the self-adaptive manner from the history data of a user, and thus flexibility and accuracy are greatly improved.

公开/授权文献

CN108255956A 基于历史数据及机器学习自适应获取词库的方法及系统公开/授权日：2018-07-06

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/33	..••查询
G06F16/332	...•••查询公式