专利检索 ap:("Michael Lewis Seltzer" OR "Kaustubh Prakash Kalgaonkar" OR "Alejandro Acero") AND inv:"Alejandro Acero" 第 1 页

1.

发明申请
Acoustic Model Adaptation Using Splines 有权
标题翻译：使用样条的声学模型适应

公开(公告)号：US20110238416A1

公开(公告)日：2011-09-29

申请号：US12730270

申请日：2010-03-24

申请人： Michael Lewis Seltzer , Kaustubh Prakash Kalgaonkar , Alejandro Acero

发明人： Michael Lewis Seltzer , Kaustubh Prakash Kalgaonkar , Alejandro Acero

IPC分类号： G10L15/20

CPC分类号： G10L15/20

摘要： Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

摘要翻译： 描述了一种技术，通过该技术，语音识别器适于在噪声环境中使用线性样条插值来执行，以近似清洁语音，噪声和噪声语音之间的非线性关系。从训练数据以及反映回归误差的方差数据中可以看出，将预测噪声特征与实际噪声特征之间的误差最小化的线性样条参数。还描述了在语音识别解码期间补偿线性信道失真和更新噪声和信道参数。

2.

发明授权
Acoustic model adaptation using splines 有权
标题翻译：使用样条的声学模型适应

公开(公告)号：US08700394B2

公开(公告)日：2014-04-15

申请号：US12730270

申请日：2010-03-24

申请人： Michael Lewis Seltzer , Kaustubh Prakash Kalgaonkar , Alejandro Acero

发明人： Michael Lewis Seltzer , Kaustubh Prakash Kalgaonkar , Alejandro Acero

IPC分类号： G10L15/20 , G10L15/00 , G10L15/04 , G10L15/28

CPC分类号： G10L15/20

摘要： Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

摘要翻译： 描述了一种技术，通过该技术，语音识别器适于在噪声环境中使用线性样条插值来执行，以近似清洁语音，噪声和噪声语音之间的非线性关系。从训练数据以及反映回归误差的方差数据中可以看出，将预测噪声特征与实际噪声特征之间的误差最小化的线性样条参数。还描述了在语音识别解码期间补偿线性信道失真和更新噪声和信道参数。

3.

发明授权
Noise adaptive training for speech recognition 有权
标题翻译：语音识别噪声适应训练

公开(公告)号：US09009039B2

公开(公告)日：2015-04-14

申请号：US12483262

申请日：2009-06-12

申请人： Michael Lewis Seltzer , James Garnet Droppo , Ozlem Kalinli , Alejandro Acero

发明人： Michael Lewis Seltzer , James Garnet Droppo , Ozlem Kalinli , Alejandro Acero

IPC分类号： G10L15/20 , G10L15/06 , G10L15/14

CPC分类号： G10L15/063 , G10L15/144 , G10L15/20

摘要： Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

摘要翻译： 这里描述了用于噪声自适应训练以实现鲁棒自动语音识别的技术。通过使用这些技术，噪声自适应训练（NAT）方法可以使用干净和损坏的语音进行训练。 NAT方法可以将环境变形归一化，作为模型训练的一部分。可以直接估计一组潜在的“伪清理”模型参数。这可以在没有将干净的语音特征的点估计作为中间步骤的情况下完成。从NAT技术学习的伪清理模型参数可以与矢量泰勒级数（VTS）适配一起使用。这种适配可以支持在自动语音识别系统的操作阶段期间解码噪声话语。

4.

发明申请
FACTORED TRANSFORMS FOR SEPARABLE ADAPTATION OF ACOUSTIC MODELS 有权
标题翻译：用于可分离适应声学模型的变换

公开(公告)号：US20130253930A1

公开(公告)日：2013-09-26

申请号：US13427907

申请日：2012-03-23

申请人： Michael Lewis Seltzer , Alejandro Acero

发明人： Michael Lewis Seltzer , Alejandro Acero

IPC分类号： G10L15/00

CPC分类号： G10L15/063 , G10L15/07 , G10L15/20

摘要： Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

摘要翻译： 本文描述的各种技术涉及使语音识别器适应于输入语音数据。可以基于与输入语音数据相对应的第一可变性源的值从第一组线性变换中选择第一线性变换，并且可以基于第二组线性变换的值，从第二组线性变换中选择第二线性变换对应于输入语音数据的第二可变性源。第一和第二组中的线性变换可以分别补偿第一可变性源和第二可变性源。此外，可以将第一线性变换应用于输入语音数据以产生中间变换语音数据，并且可以将第二线性变换应用于中间变换语音数据以生成变换语音数据。此外，可以基于变换的语音数据来识别语音以获得结果。

5.

发明授权
Factored transforms for separable adaptation of acoustic models 有权

公开(公告)号：US09984678B2

公开(公告)日：2018-05-29

申请号：US13427907

申请日：2012-03-23

申请人： Michael Lewis Seltzer , Alejandro Acero

发明人： Michael Lewis Seltzer , Alejandro Acero

IPC分类号： G10L15/06 , G10L15/20 , G10L15/07

CPC分类号： G10L15/063 , G10L15/07 , G10L15/20

摘要： Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

6.

发明申请
NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION 有权
标题翻译：用于语音识别的噪音自适应训练

公开(公告)号：US20100318354A1

公开(公告)日：2010-12-16

申请号：US12483262

申请日：2009-06-12

申请人： Michael Lewis Seltzer , James Garnet Droppo , Ozlem Kalinli , Alejandro Acero

发明人： Michael Lewis Seltzer , James Garnet Droppo , Ozlem Kalinli , Alejandro Acero

IPC分类号： G10L15/20 , G10L15/00 , G10L15/14

CPC分类号： G10L15/063 , G10L15/144 , G10L15/20

摘要： Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

摘要翻译： 这里描述了用于噪声自适应训练以实现鲁棒自动语音识别的技术。通过使用这些技术，噪声自适应训练（NAT）方法可以使用干净和损坏的语音进行训练。 NAT方法可以将环境变形归一化，作为模型训练的一部分。可以直接估计一组潜在的“伪清理”模型参数。这可以在没有将干净的语音特征的点估计作为中间步骤的情况下完成。从NAT技术学习的伪清理模型参数可以与矢量泰勒级数（VTS）适配一起使用。这种适配可以支持在自动语音识别系统的操作阶段期间解码噪声话语。

7.

发明授权
Searching a database of listings 有权
标题翻译：搜索列表的数据库

公开(公告)号：US09218412B2

公开(公告)日：2015-12-22

申请号：US11746847

申请日：2007-05-10

申请人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

发明人： Ye-Yi Wang , Dong Yu , Yun-Cheng Ju , Alejandro Acero , Geoffrey G. Zweig

IPC分类号： G06F7/00 , G06F17/30 , G06F3/06 , G10L15/187 , G10L15/197

CPC分类号： G06F17/30663 , G06F3/0641 , G06F17/3069 , G10L15/187 , G10L15/197

摘要： A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm.

摘要翻译： 使用术语频率 - 逆文档频率（Tf / Idf）算法搜索具有列表而不是长文档的数据库。

8.

发明授权
Sensor array beamformer post-processor 有权
标题翻译：传感器阵列波束形成器后处理器

公开(公告)号：US09054764B2

公开(公告)日：2015-06-09

申请号：US13187235

申请日：2011-07-20

申请人： Ivan Tashev , Alejandro Acero

发明人： Ivan Tashev , Alejandro Acero

IPC分类号： H04R3/00 , H04B7/08

CPC分类号： H04B7/0854

摘要： A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.

摘要翻译： 一种具有增强噪声抑制能力的新型波束成形后处理器技术。本波束形成后处理器技术是用于传感器阵列（例如麦克风阵列）的非线性后处理技术，其改善了方向性和信号分离能力。该技术在所谓的瞬时到达空间方向上工作，估计来自给定入射角或查找方向的声音的概率，并且应用时间变化的基于增益的时空滤波器来抑制来自其他方向的声音比声源方向，导致最小的伪影和音乐噪音。

9.

发明授权
Dual-band speech encoding 有权
标题翻译：双频语音编码

公开(公告)号：US08818797B2

公开(公告)日：2014-08-26

申请号：US12978197

申请日：2010-12-23

申请人： Alejandro Acero , James G. Droppo, III , Michael L. Seltzer

发明人： Alejandro Acero , James G. Droppo, III , Michael L. Seltzer

IPC分类号： G10L21/00

CPC分类号： G10L19/005 , G10L15/02 , G10L19/20 , G10L21/038 , G10L2019/0001

摘要： This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

摘要翻译： 本文件描述了用于双频语音编码的各种技术。在一些实施例中，从远程实体接收第一类型的语音特征，基于第一类型的语音特征来确定第二类型的语音特征的估计，将第二类型的语音特征的估计提供给语音识别器，从语音识别器接收基于第二类型语音特征的估计的语音识别结果，将语音识别结果发送到远程实体。

10.

发明授权
Robust adaptive beamforming with enhanced noise suppression 有权
标题翻译：强大的自适应波束成形，增强噪声抑制

公开(公告)号：US08818002B2

公开(公告)日：2014-08-26

申请号：US13187618

申请日：2011-07-21

申请人： Ivan Tashev , Alejandro Acero , Byung-Jun Yoon

发明人： Ivan Tashev , Alejandro Acero , Byung-Jun Yoon

IPC分类号： H04B15/00 , G01S3/86 , H04R3/00 , H04B7/08

CPC分类号： G01S3/86 , H04B7/0854 , H04R3/005 , H04R2430/20

摘要： A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.

摘要翻译： 一种具有增强噪声抑制能力的新型自适应波束成形技术。该技术将声源存在概率纳入自适应阻塞矩阵。在一个实施例中，基于输入信号的瞬时到达方向和语音活动检测来估计声源存在概率。该技术保证对导向矢量误差的鲁棒性，而不会对自适应滤波器系数施加自组织约束。它可以为双向干扰信号以及各向同性环境噪声提供良好的抑制性能。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类