AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS
    1.
    发明申请
    AUTOMATIC SPEECH RECOGNITION BASED UPON INFORMATION RETRIEVAL METHODS 审中-公开
    基于信息检索方法的自动语音识别

    公开(公告)号:US20110224982A1

    公开(公告)日:2011-09-15

    申请号:US12722556

    申请日:2010-03-12

    IPC分类号: G10L15/02

    CPC分类号: G10L15/08 G10L2015/025

    摘要: Described is a technology in which information retrieval (IR) techniques are used in a speech recognition (ASR) system. Acoustic units (e.g., phones, syllables, multi-phone units, words and/or phrases) are decoded, and features found from those acoustic units. The features are then used with IR techniques (e.g., TF-IDF based retrieval) to obtain a target output (a word or words). Also described is the use of IR techniques to provide a full large vocabulary continuous speech (LVCSR) recognizer

    摘要翻译: 描述了在语音识别(ASR)系统中使用信息检索(IR)技术的技术。 声学单元(例如,电话,音节,多电话单元,单词和/或短语)被解码,并且从那些声学单元找到的特征。 然后将特征与IR技术(例如,基于TF-IDF的检索)一起使用以获得目标输出(一个或多个单词)。 还描述了使用IR技术来提供完整的大词汇连续语音(LVCSR)识别器

    Utilizing features generated from phonic units in speech recognition
    2.
    发明授权
    Utilizing features generated from phonic units in speech recognition 有权
    利用语音单元产生的特征进行语音识别

    公开(公告)号:US08401852B2

    公开(公告)日:2013-03-19

    申请号:US12626943

    申请日:2009-11-30

    IPC分类号: G10L15/04

    CPC分类号: G10L15/10 G10L15/02

    摘要: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.

    摘要翻译: 本文描述的计算机实现的语音识别系统包括接收组件,其接收多个检测到的音频信号的单元,其中该音频信号包括个人的讲话语音。 选择器部件选择对应于特定时间跨度的多个检测单元的子集。 发生器组件相对于特定时间跨度产生至少一个特征,其中所述至少一个特征是存在特征,期望特征或编辑距离特征之一。 另外,统计语音识别模型至少部分地基于由特征生成器组件生成的至少一个特征来输出对应于特定时间跨度的至少一个单词。

    FEATURES FOR UTILIZATION IN SPEECH RECOGNITION
    3.
    发明申请
    FEATURES FOR UTILIZATION IN SPEECH RECOGNITION 有权
    语音识别中的使用特征

    公开(公告)号:US20110131046A1

    公开(公告)日:2011-06-02

    申请号:US12626943

    申请日:2009-11-30

    IPC分类号: G10L15/04

    CPC分类号: G10L15/10 G10L15/02

    摘要: A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.

    摘要翻译: 本文描述的计算机实现的语音识别系统包括接收组件,其接收多个检测到的音频信号的单元,其中该音频信号包括个人的讲话语音。 选择器部件选择对应于特定时间跨度的多个检测单元的子集。 发生器组件相对于特定时间跨度产生至少一个特征,其中所述至少一个特征是存在特征,期望特征或编辑距离特征之一。 另外,统计语音识别模型至少部分地基于由特征生成器组件生成的至少一个特征来输出对应于特定时间跨度的至少一个单词。

    Structured models of repetition for speech recognition
    4.
    发明授权
    Structured models of repetition for speech recognition 有权
    用于语音识别的重复结构化模型

    公开(公告)号:US08965765B2

    公开(公告)日:2015-02-24

    申请号:US12233826

    申请日:2008-09-19

    IPC分类号: G10L15/00 G10L15/18

    CPC分类号: G10L15/1822

    摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

    摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。

    STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION
    6.
    发明申请
    STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION 有权
    用于语音识别的结构化复制模型

    公开(公告)号:US20100076765A1

    公开(公告)日:2010-03-25

    申请号:US12233826

    申请日:2008-09-19

    IPC分类号: G10L15/00

    CPC分类号: G10L15/1822

    摘要: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.

    摘要翻译: 描述了一种技术,通过该技术,部分地基于先前的话语,使用结构化重复模型来确定用户说出的单词和/或相应的数据库条目。 对于重复的话语,对由一个或多个识别器识别的相应字序列(和至少一些)和相关联的声学数据进行联合概率分析。 例如,可以在分析中使用生成概率模型或最大熵模型。 第二个发音可以是使用精确的单词或相对于第一个发音的其他结构变换的第一个发音的重复,例如添加一个或多个单词的扩展,删除一个或多个单词的截断或整个 或一个或多个单词的部分拼写。

    Sensor array beamformer post-processor
    8.
    发明授权
    Sensor array beamformer post-processor 有权
    传感器阵列波束形成器后处理器

    公开(公告)号:US09054764B2

    公开(公告)日:2015-06-09

    申请号:US13187235

    申请日:2011-07-20

    IPC分类号: H04R3/00 H04B7/08

    CPC分类号: H04B7/0854

    摘要: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.

    摘要翻译: 一种具有增强噪声抑制能力的新型波束成形后处理器技术。 本波束形成后处理器技术是用于传感器阵列(例如麦克风阵列)的非线性后处理技术,其改善了方向性和信号分离能力。 该技术在所谓的瞬时到达空间方向上工作,估计来自给定入射角或查找方向的声音的概率,并且应用时间变化的基于增益的时空滤波器来抑制来自其他方向的声音 比声源方向,导致最小的伪影和音乐噪音。

    Dual-band speech encoding
    9.
    发明授权
    Dual-band speech encoding 有权
    双频语音编码

    公开(公告)号:US08818797B2

    公开(公告)日:2014-08-26

    申请号:US12978197

    申请日:2010-12-23

    IPC分类号: G10L21/00

    摘要: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

    摘要翻译: 本文件描述了用于双频语音编码的各种技术。 在一些实施例中,从远程实体接收第一类型的语音特征,基于第一类型的语音特征来确定第二类型的语音特征的估计,将第二类型的语音特征的估计提供给 语音识别器,从语音识别器接收基于第二类型语音特征的估计的语音识别结果,将语音识别结果发送到远程实体。

    Robust adaptive beamforming with enhanced noise suppression
    10.
    发明授权
    Robust adaptive beamforming with enhanced noise suppression 有权
    强大的自适应波束成形,增强噪声抑制

    公开(公告)号:US08818002B2

    公开(公告)日:2014-08-26

    申请号:US13187618

    申请日:2011-07-21

    摘要: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.

    摘要翻译: 一种具有增强噪声抑制能力的新型自适应波束成形技术。 该技术将声源存在概率纳入自适应阻塞矩阵。 在一个实施例中,基于输入信号的瞬时到达方向和语音活动检测来估计声源存在概率。 该技术保证对导向矢量误差的鲁棒性,而不会对自适应滤波器系数施加自组织约束。 它可以为双向干扰信号以及各向同性环境噪声提供良好的抑制性能。