System and method for rescoring N-best hypotheses of an automatic speech recognition system

Invention Grant

US07761296B1 System and method for rescoring N-best hypotheses of an automatic speech recognition system 失效

Title translation: 自动语音识别系统的N最佳假设的系统和方法

Please log in to see more content

Patent Title: System and method for rescoring N-best hypotheses of an automatic speech recognition system
Patent Title (中): 自动语音识别系统的N最佳假设的系统和方法
Application No.: US09286099

Application Date: 1999-04-02
Publication No.: US07761296B1

Publication Date: 2010-07-20
Inventor: Raimo Bakis , Ellen M. Eide
Applicant: Raimo Bakis , Ellen M. Eide
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agency: F. Chau & Associates, LLC
Main IPC: G10L17/00
IPC: G10L17/00 ; G10L15/00

System and method for rescoring N-best hypotheses of an automatic speech recognition system

Abstract:

A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses. The distance of a synthesized hypothesis to the original speech signal is then computed as the sum over all phonemes in the hypothesis of the Euclidean distance between the means of the feature vectors of the frames aligning to that phoneme for the original and the synthesized signals. The text of the hypothesis which is closest under the above metric to the original waveform is chosen as the final system output.

Abstract(Chinese):

一种用于通过将原始语音波形与针对N个最佳假设的每个文本序列生成的合成语音波形进行比较，从自动语音识别系统中获取N个最佳假设的系统和方法。从原始语音波形到每个合成波形计算距离，并选择与被确定为最接近原始波形的合成波形相关联的文本作为最终假设。原始波形和每个合成波形与音素级上的相应文本序列对齐。针对原始波形以及每个合成假设计算与每个音素对准的特征向量的平均值。然后，将合成假设与原始语音信号的距离计算为在与原始音素对应的帧的对象的特征向量的装置与合成信号之间的欧氏距离的假设中的所有音素之和。选择与上述度量下最接近原始波形的假设文本作为最终的系统输出。

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证