流式端到端语音识别方法、装置及电子设备

Invention Application

WO2021218843A1 流式端到端语音识别方法、装置及电子设备审中-公开

Please log in to see more content

Patent Title: 流式端到端语音识别方法、装置及电子设备
Patent Title (English): STREAMING END-TO-END SPEECH RECOGNITION METHOD AND APPARATUS, AND ELECTRONIC DEVICE
Application No.: PCT/CN2021/089556

Application Date: 2021-04-25
Publication No.: WO2021218843A1

Publication Date: 2021-11-04
Inventor: 张仕良 , 高志付
Applicant: 阿里巴巴集团控股有限公司
Applicant Address: 开曼群岛大开曼资本大厦一座四层847号邮箱, Grand Cayman
Assignee: 阿里巴巴集团控股有限公司
Current Assignee: 阿里巴巴集团控股有限公司
Current Assignee Address: 开曼群岛大开曼资本大厦一座四层847号邮箱, Grand Cayman
Agency: 北京三友知识产权代理有限公司
Priority: CN202010366907.5 2020-04-30
Main IPC: G10L15/20
IPC: G10L15/20

Abstract:

一种流式端到端语音识别方法、装置及电子设备，方法包括：以帧为单位对接收到的语音流进行语音声学特征提取并进行编码（S301）；对已完成编码的帧进行分块处理，并对同一分块中包含的需要进行编码输出的激活点数量进行预测（S302）；根据预测结果确定需要进行解码输出的激活点所在的位置，以便解码器在激活点所在的位置进行解码并输出识别结果（S303）。通过本方法能够提升流式端到端语音识别系统对噪声的鲁棒性，进而提升系统性能以及准确度。

Abstract(English):

A streaming end-to-end speech recognition method and apparatus, and an electronic device. The method comprises: performing speech acoustic feature extraction and encoding on a received speech stream in a unit of frame (S301); dividing an encoded frame into blocks, and predicting the number of activation points that are comprised in a same block and need to be encoded and output (S302); and determining, according to the prediction result, locations of activation points that need to be decoded and output, so that a decoder can perform decoding at the locations of the activation points and output a recognition result (S303). The method can improve the robustness of a streaming end-to-end speech recognition system in terms of noise, thereby improving the system performance and accuracy.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/20	.专门适用于不利环境（例如，噪音环境）中保持鲁棒性或增强语音强度的语音识别技术（G10L21/02优先）