Speech recognition with non-linear noise reduction on Mel-frequency cepstra
    81.
    发明授权
    Speech recognition with non-linear noise reduction on Mel-frequency cepstra 有权
    在梅尔频率cepstra上进行非线性降噪的语音识别

    公开(公告)号:US08306817B2

    公开(公告)日:2012-11-06

    申请号:US11970537

    申请日:2008-01-08

    IPC分类号: G10L15/00

    摘要: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.

    摘要翻译: 在自动语音识别系统中,特征提取器从语音信号中提取特征,并且基于提取的特征,通过自动语音识别系统识别语音。 通过特征增强提供降噪作为特征提取器的一部分,其中基于最小均方误差准则提供了以Mel-frequency cepstra形式的特征域降噪。 具体来说,设计的方法考虑了清洁语音和混合噪声之间的随机相位。 为了执行环境鲁棒的语音识别,特征域噪声降低以维度方式执行到输入到自动语音识别系统的特征向量的各个维度。

    SYSTEM AND METHOD FOR EFFICIENT LASER PROCESSING OF A MOVING WEB-BASED MATERIAL
    82.
    发明申请
    SYSTEM AND METHOD FOR EFFICIENT LASER PROCESSING OF A MOVING WEB-BASED MATERIAL 有权
    用于基于网络的移动材料的高效激光加工的系统和方法

    公开(公告)号:US20110015927A1

    公开(公告)日:2011-01-20

    申请号:US12884434

    申请日:2010-09-17

    IPC分类号: G10L15/26

    摘要: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

    摘要翻译: 自动语音识别系统识别用户对规定文本的改变,并且推测这种改变是否由用户改变主意而产生,或者这些改变是否是识别错误的结果。 如果检测到识别错误,则系统使用用户校正的类型进行自身修改,以减少再次发生这种识别错误的可能性。 因此,该系统和方法提供了很少或没有额外的用户交互的重要语音识别学习。

    HIGH PERFORMANCE HMM ADAPTATION WITH JOINT COMPENSATION OF ADDITIVE AND CONVOLUTIVE DISTORTIONS
    83.
    发明申请
    HIGH PERFORMANCE HMM ADAPTATION WITH JOINT COMPENSATION OF ADDITIVE AND CONVOLUTIVE DISTORTIONS 有权
    高性能HMM适应与补充和转换失败的联合补偿

    公开(公告)号:US20090144059A1

    公开(公告)日:2009-06-04

    申请号:US11949044

    申请日:2007-12-03

    IPC分类号: G10L15/14

    CPC分类号: G10L15/20 G10L15/142

    摘要: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

    摘要翻译: 讨论了补偿施加到表示话语的信号的加法和卷积失真的方法。 该方法包括接收信号并初始化噪声平均和信道均值向量。 计算或更新高斯依赖矩阵和隐马尔可夫模型(HMM)参数以考虑来自信道平均向量的噪声平均向量或卷积失真的加性噪声​​。 HMM参数通过使用先前计算出的HMM参数解码话音并根据解码期间接收到的数据调整高斯相关矩阵和HMM参数进行调整。 适应的HMM参数被应用于解码输入的话语并提供话语的转录。

    Time synchronous decoding for long-span hidden trajectory model
    84.
    发明申请
    Time synchronous decoding for long-span hidden trajectory model 有权
    长跨隐藏轨迹模型的时间同步解码

    公开(公告)号:US20070198266A1

    公开(公告)日:2007-08-23

    申请号:US11356905

    申请日:2006-02-17

    IPC分类号: G10L15/28

    CPC分类号: G10L15/08

    摘要: A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.

    摘要翻译: 开发了一种时间同步的格格约束搜索算法,用于处理具有长语境跨度能力的语言语言模型。 在算法中,假设被表示为包括当前帧,先前帧和未来帧的指示的迹线。 每个帧可以包括相关联的语言单元,例如从电话派生的电话或单元。 此外,可以应用修剪策略来加快搜索速度。 此外,开发了文字重组方法以加速计算。 这些方法可以有效地处理指数级增加的搜索空间。

    Parameter learning in a hidden trajectory model
    85.
    发明申请
    Parameter learning in a hidden trajectory model 有权
    隐藏轨迹模型中的参数学习

    公开(公告)号:US20070198260A1

    公开(公告)日:2007-08-23

    申请号:US11356898

    申请日:2006-02-17

    IPC分类号: G10L15/00

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

    摘要翻译: 使用用于观察向量的声学似然函数作为优化的反对函数来估计包括装置和方差的隐藏轨迹模型的分布参数。 该估计仅包括声学数据,而不包括对隐藏的动态变量的任何中间估计。 可以开发梯度上升方法来优化声似然函数。

    Time asynchronous decoding for long-span trajectory model
    86.
    发明申请
    Time asynchronous decoding for long-span trajectory model 失效
    用于长跨度轨迹模型的时间异步解码

    公开(公告)号:US20070143112A1

    公开(公告)日:2007-06-21

    申请号:US11311951

    申请日:2005-12-20

    IPC分类号: G10L15/18

    CPC分类号: G10L15/08 G10L15/187

    摘要: A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.

    摘要翻译: 开发了时间异步网格约束搜索算法,用于处理具有长语境跨度能力的语言语言模型。 在算法中,从模型开发的网格中的节点和链接通过预先扩展。 估计搜索算法使用的启发式算法。 此外,可以应用修剪策略来加快搜索速度。

    Learning statistically characterized resonance targets in a hidden trajectory model
    87.
    发明申请
    Learning statistically characterized resonance targets in a hidden trajectory model 有权
    在隐藏的轨迹模型中学习统计学上的共振目标

    公开(公告)号:US20070143104A1

    公开(公告)日:2007-06-21

    申请号:US11303899

    申请日:2005-12-15

    IPC分类号: G10L19/06

    摘要: A statistical trajectory speech model is constructed where the targets for vocal tract resonances are represented as random vectors and where the mean vectors of the target distributions are estimated using a likelihood function for joint acoustic observation vectors. The target mean vectors can be estimated without formant data. To form the model, time-dependent filter parameter vectors based on time-dependent coarticulation parameters are constructed that are a function of the ordering and identity of the phones in the phone sequence in each speech utterance. The filter parameter vectors are also a function of the temporal extent of coarticulation and of the speaker's speaking effort.

    摘要翻译: 构建统计轨迹语音模型,其中声道共振的目标被表示为随机向量,并且使用关联声学观测向量的似然函数来估计目标分布的平均向量。 可以不使用共振峰数据来估计目标平均向量。 为了形成模型,构建了基于时间依赖的协方差参数的随时间依赖的滤波器参数矢量,其是每个语音话语中电话序列中的电话的排序和身份的函数。 滤波器参数矢量也是协调的时间范围和说话者的说话力的函数。

    Automatic speech recognition learning using user corrections
    88.
    发明申请
    Automatic speech recognition learning using user corrections 有权
    自动语音识别学习使用用户更正

    公开(公告)号:US20050159949A1

    公开(公告)日:2005-07-21

    申请号:US10761451

    申请日:2004-01-20

    IPC分类号: G10L15/22 G10L15/00

    摘要: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

    摘要翻译: 自动语音识别系统识别用户对规定文本的改变,并且推测这种改变是否由用户改变主意而产生,或者这些改变是否是识别错误的结果。 如果检测到识别错误,则系统使用用户校正的类型进行自身修改,以减少再次发生这种识别错误的可能性。 因此,该系统和方法提供了很少或没有额外的用户交互的重要语音识别学习。

    Sensor array beamformer post-processor
    89.
    发明授权
    Sensor array beamformer post-processor 有权
    传感器阵列波束形成器后处理器

    公开(公告)号:US09054764B2

    公开(公告)日:2015-06-09

    申请号:US13187235

    申请日:2011-07-20

    IPC分类号: H04R3/00 H04B7/08

    CPC分类号: H04B7/0854

    摘要: A novel beamforming post-processor technique with enhanced noise suppression capability. The present beamforming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction, resulting in minimal artifacts and musical noise.

    摘要翻译: 一种具有增强噪声抑制能力的新型波束成形后处理器技术。 本波束形成后处理器技术是用于传感器阵列(例如麦克风阵列)的非线性后处理技术,其改善了方向性和信号分离能力。 该技术在所谓的瞬时到达空间方向上工作,估计来自给定入射角或查找方向的声音的概率,并且应用时间变化的基于增益的时空滤波器来抑制来自其他方向的声音 比声源方向,导致最小的伪影和音乐噪音。

    Dual-band speech encoding
    90.
    发明授权
    Dual-band speech encoding 有权
    双频语音编码

    公开(公告)号:US08818797B2

    公开(公告)日:2014-08-26

    申请号:US12978197

    申请日:2010-12-23

    IPC分类号: G10L21/00

    摘要: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.

    摘要翻译: 本文件描述了用于双频语音编码的各种技术。 在一些实施例中,从远程实体接收第一类型的语音特征,基于第一类型的语音特征来确定第二类型的语音特征的估计,将第二类型的语音特征的估计提供给 语音识别器,从语音识别器接收基于第二类型语音特征的估计的语音识别结果,将语音识别结果发送到远程实体。