Coupled hidden Markov model (CHMM) for continuous audiovisual speech recognition
    1.
    发明授权
    Coupled hidden Markov model (CHMM) for continuous audiovisual speech recognition 有权
    耦合隐马尔可夫模型(CHMM)用于连续视听语音识别

    公开(公告)号:US07454342B2

    公开(公告)日:2008-11-18

    申请号:US10392709

    申请日:2003-03-19

    IPC分类号: G10L15/14

    摘要: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.

    摘要翻译: 本文描述了使用耦合隐马尔可夫模型(CHMM)的视听连续语音识别(AVCSR)系统的方法和装置。 在一个方面,示例性过程包括接收音频数据流和视频数据流,以及使用多个隐马尔可夫模型(HMM),基于音频和视频数据流执行连续语音识别,每个HMM的节点 在时隙处于前一时隙处的相关HMM的一个或多个节点。 还描述了其它方法和装置。

    High-order entropy error functions for neural classifiers
    4.
    发明授权
    High-order entropy error functions for neural classifiers 失效
    神经分类器的高阶熵误差函数

    公开(公告)号:US07346497B2

    公开(公告)日:2008-03-18

    申请号:US10332651

    申请日:2001-05-08

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    IPC分类号: G10L11/00

    摘要: An automatic speech recognition system comprising a speech decoder to resolve phone and word level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.

    摘要翻译: 一种自动语音识别系统,包括用于解析电话和字级信息的语音解码器,矢量生成器,用于生成由神经网络分类器(ANN)基于置信度量度的信息矢量。 设计出不会产生假饱和或过度专业化的误差信号。 误差信号被集成到通过ANN反向传播的误差函数中。

    High-order entropy error functions for neural classifiers
    5.
    发明申请
    High-order entropy error functions for neural classifiers 失效
    神经分类器的高阶熵误差函数

    公开(公告)号:US20050015251A1

    公开(公告)日:2005-01-20

    申请号:US10332651

    申请日:2001-05-08

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    摘要: An automatic speech recognition system comprising a speech decoder to resolve phone and world level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.

    摘要翻译: 一种自动语音识别系统,包括用于解析电话和世界级信息的语音解码器,矢量生成器,用于生成由神经网络分类器(ANN)基于置信度量度的信息矢量。 设计出不会产生假饱和或过度专业化的误差信号。 误差信号被集成到通过ANN反向传播的误差函数中。

    Voice barge-in in telephony speech recognition
    6.
    发明授权
    Voice barge-in in telephony speech recognition 有权
    语音插入电话语音识别

    公开(公告)号:US07437286B2

    公开(公告)日:2008-10-14

    申请号:US10204034

    申请日:2000-12-27

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    IPC分类号: G10L11/02

    摘要: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

    摘要翻译: 描述了一种交互式语音应答系统,其支持全双工数据传输,以便在系统从用户收听语音插入时,向电话系统的用户播放语音提示。 该系统包括语音检测模块,其可以利用各种标准,例如帧能量幅度和持续时间阈值来检测语音。 该系统还包括自动语音识别引擎。 当自动语音识别引擎识别出语音段时,可以使用特征提取模块从系统记录的回波污浊语音频谱中减去对应于当前播放的语音提示的提示回波频谱。 为了改进频谱减法,还可以执行回声污浊语音与提示回波之间的时间延迟的估计。

    Voice barge-in in telephony speech recognition

    公开(公告)号:US08473290B2

    公开(公告)日:2013-06-25

    申请号:US12197801

    申请日:2008-08-25

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    IPC分类号: G10L11/02

    摘要: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

    VOICE BARGE-IN IN TELEPHONY SPEECH RECOGNITION
    8.
    发明申请
    VOICE BARGE-IN IN TELEPHONY SPEECH RECOGNITION 有权
    电话语音识别中的语音

    公开(公告)号:US20080310601A1

    公开(公告)日:2008-12-18

    申请号:US12197801

    申请日:2008-08-25

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    IPC分类号: H04M1/64

    摘要: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

    摘要翻译: 描述了一种交互式语音应答系统,其支持全双工数据传输,以便在系统从用户收听语音插入时,向电话系统的用户播放语音提示。 该系统包括语音检测模块,其可以利用各种标准,例如帧能量幅度和持续时间阈值来检测语音。 该系统还包括自动语音识别引擎。 当自动语音识别引擎识别出语音段时,可以使用特征提取模块从系统记录的回波污浊语音频谱中减去对应于当前播放的语音提示的提示回波频谱。 为了改进频谱减法,还可以执行回声污浊语音与提示回波之间的时间延迟的估计。

    Method and apparatus for rejection of speech recognition results in accordance with confidence level
    9.
    发明授权
    Method and apparatus for rejection of speech recognition results in accordance with confidence level 失效
    根据置信水平排除语音识别结果的方法和装置

    公开(公告)号:US07072750B2

    公开(公告)日:2006-07-04

    申请号:US10332650

    申请日:2001-05-08

    申请人: Xiaobo Pi Ying Jia

    发明人: Xiaobo Pi Ying Jia

    IPC分类号: G06F7/00

    CPC分类号: G10L15/14

    摘要: An automatic speech recognition system for continuous speech recognition of vocabulary words for an autoattendent system proving hand-free telephone calling and utilizing a vocabulary comprising numbers or names of people to be called using known techniques for automatic speech recognition models of word sequencing resulting in high confidence levels of recognition.

    摘要翻译: 一种自动语音识别系统,用于自动人事系统的词汇词的连续语音识别,证明免提电话呼叫并利用包括使用已知技术的人数或姓名的词汇,以使用字序列的自动语音识别模型导致高置信度 认可水平