-
公开(公告)号:US20110238416A1
公开(公告)日:2011-09-29
申请号:US12730270
申请日:2010-03-24
IPC分类号: G10L15/20
CPC分类号: G10L15/20
摘要: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
摘要翻译: 描述了一种技术,通过该技术,语音识别器适于在噪声环境中使用线性样条插值来执行,以近似清洁语音,噪声和噪声语音之间的非线性关系。 从训练数据以及反映回归误差的方差数据中可以看出,将预测噪声特征与实际噪声特征之间的误差最小化的线性样条参数。 还描述了在语音识别解码期间补偿线性信道失真和更新噪声和信道参数。
-
公开(公告)号:US08700394B2
公开(公告)日:2014-04-15
申请号:US12730270
申请日:2010-03-24
CPC分类号: G10L15/20
摘要: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
摘要翻译: 描述了一种技术,通过该技术,语音识别器适于在噪声环境中使用线性样条插值来执行,以近似清洁语音,噪声和噪声语音之间的非线性关系。 从训练数据以及反映回归误差的方差数据中可以看出,将预测噪声特征与实际噪声特征之间的误差最小化的线性样条参数。 还描述了在语音识别解码期间补偿线性信道失真和更新噪声和信道参数。
-
公开(公告)号:US09009039B2
公开(公告)日:2015-04-14
申请号:US12483262
申请日:2009-06-12
CPC分类号: G10L15/063 , G10L15/144 , G10L15/20
摘要: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
摘要翻译: 这里描述了用于噪声自适应训练以实现鲁棒自动语音识别的技术。 通过使用这些技术,噪声自适应训练(NAT)方法可以使用干净和损坏的语音进行训练。 NAT方法可以将环境变形归一化,作为模型训练的一部分。 可以直接估计一组潜在的“伪清理”模型参数。 这可以在没有将干净的语音特征的点估计作为中间步骤的情况下完成。 从NAT技术学习的伪清理模型参数可以与矢量泰勒级数(VTS)适配一起使用。 这种适配可以支持在自动语音识别系统的操作阶段期间解码噪声话语。
-
公开(公告)号:US20130253930A1
公开(公告)日:2013-09-26
申请号:US13427907
申请日:2012-03-23
IPC分类号: G10L15/00
CPC分类号: G10L15/063 , G10L15/07 , G10L15/20
摘要: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
摘要翻译: 本文描述的各种技术涉及使语音识别器适应于输入语音数据。 可以基于与输入语音数据相对应的第一可变性源的值从第一组线性变换中选择第一线性变换,并且可以基于第二组线性变换的值,从第二组线性变换中选择第二线性变换 对应于输入语音数据的第二可变性源。 第一和第二组中的线性变换可以分别补偿第一可变性源和第二可变性源。 此外,可以将第一线性变换应用于输入语音数据以产生中间变换语音数据,并且可以将第二线性变换应用于中间变换语音数据以生成变换语音数据。 此外,可以基于变换的语音数据来识别语音以获得结果。
-
公开(公告)号:US09984678B2
公开(公告)日:2018-05-29
申请号:US13427907
申请日:2012-03-23
CPC分类号: G10L15/063 , G10L15/07 , G10L15/20
摘要: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
-
公开(公告)号:US20100318354A1
公开(公告)日:2010-12-16
申请号:US12483262
申请日:2009-06-12
CPC分类号: G10L15/063 , G10L15/144 , G10L15/20
摘要: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
摘要翻译: 这里描述了用于噪声自适应训练以实现鲁棒自动语音识别的技术。 通过使用这些技术,噪声自适应训练(NAT)方法可以使用干净和损坏的语音进行训练。 NAT方法可以将环境变形归一化,作为模型训练的一部分。 可以直接估计一组潜在的“伪清理”模型参数。 这可以在没有将干净的语音特征的点估计作为中间步骤的情况下完成。 从NAT技术学习的伪清理模型参数可以与矢量泰勒级数(VTS)适配一起使用。 这种适配可以支持在自动语音识别系统的操作阶段期间解码噪声话语。
-
公开(公告)号:US09218412B2
公开(公告)日:2015-12-22
申请号:US11746847
申请日:2007-05-10
申请人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
发明人: Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig
IPC分类号: G06F7/00 , G06F17/30 , G06F3/06 , G10L15/187 , G10L15/197
CPC分类号: G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197
摘要: A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.
摘要翻译: 使用术语频率 - 逆文档频率(Tf / Idf)算法搜索具有列表而不是长文档的数据库。
-
公开(公告)号:US09054764B2
公开(公告)日:2015-06-09
申请号:US13187235
申请日:2011-07-20
申请人: Ivan Tashev , Alejandro Acero
发明人: Ivan Tashev , Alejandro Acero
CPC分类号: H04B7/0854
摘要: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.
摘要翻译: 一种具有增强噪声抑制能力的新型波束成形后处理器技术。 本波束形成后处理器技术是用于传感器阵列(例如麦克风阵列)的非线性后处理技术,其改善了方向性和信号分离能力。 该技术在所谓的瞬时到达空间方向上工作,估计来自给定入射角或查找方向的声音的概率,并且应用时间变化的基于增益的时空滤波器来抑制来自其他方向的声音 比声源方向,导致最小的伪影和音乐噪音。
-
公开(公告)号:US08818797B2
公开(公告)日:2014-08-26
申请号:US12978197
申请日:2010-12-23
IPC分类号: G10L21/00
CPC分类号: G10L19/005 , G10L15/02 , G10L19/20 , G10L21/038 , G10L2019/0001
摘要: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
摘要翻译: 本文件描述了用于双频语音编码的各种技术。 在一些实施例中,从远程实体接收第一类型的语音特征,基于第一类型的语音特征来确定第二类型的语音特征的估计,将第二类型的语音特征的估计提供给 语音识别器,从语音识别器接收基于第二类型语音特征的估计的语音识别结果,将语音识别结果发送到远程实体。
-
公开(公告)号:US08818002B2
公开(公告)日:2014-08-26
申请号:US13187618
申请日:2011-07-21
申请人: Ivan Tashev , Alejandro Acero , Byung-Jun Yoon
发明人: Ivan Tashev , Alejandro Acero , Byung-Jun Yoon
CPC分类号: G01S3/86 , H04B7/0854 , H04R3/005 , H04R2430/20
摘要: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.
摘要翻译: 一种具有增强噪声抑制能力的新型自适应波束成形技术。 该技术将声源存在概率纳入自适应阻塞矩阵。 在一个实施例中,基于输入信号的瞬时到达方向和语音活动检测来估计声源存在概率。 该技术保证对导向矢量误差的鲁棒性,而不会对自适应滤波器系数施加自组织约束。 它可以为双向干扰信号以及各向同性环境噪声提供良好的抑制性能。
-
-
-
-
-
-
-
-
-