DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION
    81.
    发明申请
    DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION 有权
    连续使用非线性随机投影,限制性BOLTZMANN机器和基于批量的平行优化的深层网络

    公开(公告)号:US20120254086A1

    公开(公告)日:2012-10-04

    申请号:US13077978

    申请日:2011-03-31

    IPC分类号: G06N3/08

    摘要: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

    摘要翻译: 本文公开了一种方法,其包括使处理器访问被保留在计算机可读介质中的称为深凸网络的深层结构的分层或层次模型的动作,其中深层结构模型包括多个具有 分配给它的权重。 该分层模型可以产生作为分数的输出,以与隐藏的马尔可夫模型和语言模型分数中的状态之间的转移概率相结合,以形成完整的语音识别器。 该方法联合使用非线性随机投影和RBM权重,并将较低模块的输出与原始数据叠加以建立其立即更高的模块。 执行基于批次的凸优化来学习深凸网络权重的一部分,使其适合于并行计算以完成训练。 该方法还可以包括使用基于序列而不是一组不相关帧的优化准则共同基本优化深层结构模型的权重,转移概率和语言模型分数的动作。

    Noise suppressor for robust speech recognition
    82.
    发明授权
    Noise suppressor for robust speech recognition 有权
    噪声抑制器用于强大的语音识别

    公开(公告)号:US08185389B2

    公开(公告)日:2012-05-22

    申请号:US12335558

    申请日:2008-12-16

    IPC分类号: G10L15/20

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.

    摘要翻译: 描述了通常用于语音输入的噪声降低技术,其中除了信噪比(SNR)之外,基于与该帧相关联的噪声电平来确定用于帧的噪声抑制相关增益值。 在一个实现中,降噪机制基于最小均方误差,Mel-frequency cepstra降噪技术。 设置高增益值(例如一个),以在噪声电平低于阈值低电平时实现很少或没有噪声抑制,以及设置或计算的低增益值,以实现高于阈值高噪声电平的大噪声抑制。 使用噪声功率相关函数,例如对数线性插值来计算阈值之间的增益。 可以通过基于先前帧的增益值修改增益值来执行平滑化。 还描述了通过步进自适应识别学习算法在降噪中使用的学习参数。

    ROBUST ADAPTIVE BEAMFORMING WITH ENHANCED NOISE SUPPRESSION
    83.
    发明申请
    ROBUST ADAPTIVE BEAMFORMING WITH ENHANCED NOISE SUPPRESSION 有权
    具有增强噪声抑制功能的稳健自适应光束

    公开(公告)号:US20110274291A1

    公开(公告)日:2011-11-10

    申请号:US13187618

    申请日:2011-07-21

    IPC分类号: H04B15/00

    摘要: A novel adaptive beamforming technique with enhanced noise suppression capability. The technique incorporates the sound-source presence probability into an adaptive blocking matrix. In one embodiment the sound-source presence probability is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The technique guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise.

    摘要翻译: 一种具有增强噪声抑制能力的新型自适应波束成形技术。 该技术将声源存在概率纳入自适应阻塞矩阵。 在一个实施例中,基于输入信号的瞬时到达方向和语音活动检测来估计声源存在概率。 该技术保证对导向矢量误差的鲁棒性,而不会对自适应滤波器系数施加自组织约束。 它可以为双向干扰信号以及各向同性环境噪声提供良好的抑制性能。

    Acoustic Model Adaptation Using Splines
    84.
    发明申请
    Acoustic Model Adaptation Using Splines 有权
    使用样条的声学模型适应

    公开(公告)号:US20110238416A1

    公开(公告)日:2011-09-29

    申请号:US12730270

    申请日:2010-03-24

    IPC分类号: G10L15/20

    CPC分类号: G10L15/20

    摘要: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

    摘要翻译: 描述了一种技术,通过该技术,语音识别器适于在噪声环境中使用线性样条插值来执行,以近似清洁语音,噪声和噪声语音之间的非线性关系。 从训练数据以及反映回归误差的方差数据中可以看出,将预测噪声特征与实际噪声特征之间的误差最小化的线性样条参数。 还描述了在语音识别解码期间补偿线性信道失真和更新噪声和信道参数。

    Pitch model for noise estimation
    85.
    发明授权
    Pitch model for noise estimation 有权
    用于噪声估计的间距模型

    公开(公告)号:US07925502B2

    公开(公告)日:2011-04-12

    申请号:US11788323

    申请日:2007-04-19

    IPC分类号: G10L11/06

    CPC分类号: G10L21/02

    摘要: Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.

    摘要翻译: 跟踪单个样本的间距,比分析框架更频繁。 基于跟踪音调识别语音,并且用时变滤波器去除信号的语音分量,仅留下时变语音信号的估计。 然后,该估计用于产生随时间变化的噪声模型,该模型又可用于增强语音相关系统。

    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION
    86.
    发明申请
    USING COMBINED ANSWERS IN MACHINE-BASED EDUCATION 审中-公开
    在基于机器的教育中使用组合回答

    公开(公告)号:US20100311030A1

    公开(公告)日:2010-12-09

    申请号:US12477138

    申请日:2009-06-03

    IPC分类号: G09B3/00

    CPC分类号: G09B7/02

    摘要: Described is a technology for learning a foreign language or other subject. Answers (e.g., translations) to questions (e.g., sentences to translate) received from learners are combined into a combined answer that serves as a representative model answer for those learners. The questions also may be provided to machine subsystems to generate machine answers, e.g., machine translators, with those machine answers used in the combined answer. The combined answer is used to evaluate each learner's individual answer. The evaluation may be used to compute profile information that is then fed back for use in selecting further questions, e.g., more difficult sentences as the learners progress. Also described is integrating the platform/technology into a web service.

    摘要翻译: 描述了一种学习外语或其他科目的技术。 将从学习者接收到的问题(例如,翻译)的问题(例如,要翻译的句子)组合成为用于那些学习者的代表性模型答案的组合答案。 也可以将这些问题提供给机器子系统,以便在组合的答案中使用这些机器答案来产生机器答案,例如机器翻译器。 组合的答案用于评估每个学习者的个人答案。 评估可以用于计算简档信息,然后将其反馈用于选择进一步的问题,例如学习者进步时更难的句子。 还描述了将平台/技术集成到Web服务中。

    Speech index pruning
    87.
    发明授权
    Speech index pruning 有权
    语音索引修剪

    公开(公告)号:US07831428B2

    公开(公告)日:2010-11-09

    申请号:US11270673

    申请日:2005-11-09

    IPC分类号: G10L21/00

    摘要: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

    摘要翻译: 通过识别用于语音段的至少两个备选词序列来索引语音片段。 对于替代序列中的每个单词,信息被放置在索引中的单词的条目中。 基于词出现在语音片段中的概率与阈值的比较,从索引中的条目中消除语音单元。

    MAXIMUM ENTROPY MODEL WITH CONTINUOUS FEATURES
    89.
    发明申请
    MAXIMUM ENTROPY MODEL WITH CONTINUOUS FEATURES 审中-公开
    具有连续特征的最大熵模型

    公开(公告)号:US20100256977A1

    公开(公告)日:2010-10-07

    申请号:US12416161

    申请日:2009-04-01

    IPC分类号: G10L15/00

    摘要: Described is a technology by which a maximum entropy (MaxEnt) model, such as used as a classifier or in a conditional random field or hidden conditional random field that embed the maximum entropy model, uses continuous features with continuous weights that are continuous functions of the feature values (instead of single-valued weights). The continuous weights may be approximated by a spline-based solution. In general, this converts the optimization problem into a standard log-linear optimization problem without continuous weights at a higher-dimensional space.

    摘要翻译: 描述了最大熵(MaxEnt)模型,例如用作分类器或嵌入最大熵模型的条件随机场或隐藏条件随机场的最大熵(MaxEnt)模型使用具有连续权重的连续特征,连续权重是连续权重, 特征值(而不是单值权重)。 连续权重可以通过基于样条的解决方案近似。 一般来说,这将优化问题转化为标准的对数线性优化问题,而在较高维度的空间则没有连续权重。

    AUDIO TRANSFORMS IN CONNECTION WITH MULTIPARTY COMMUNICATION
    90.
    发明申请
    AUDIO TRANSFORMS IN CONNECTION WITH MULTIPARTY COMMUNICATION 有权
    与多媒体通信相关的音频转换

    公开(公告)号:US20100195812A1

    公开(公告)日:2010-08-05

    申请号:US12365949

    申请日:2009-02-05

    IPC分类号: H04M3/42 G10L11/00

    摘要: The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence.

    摘要翻译: 所要求保护的主题涉及可以预处理通信的音频部分以便丰富多方通信会话或环境的架构。 特别地,该架构可以提供公共通信的公共信道,其由基本上所有连接的各方接收,并且可以进一步提供由所有连接方的所选子集接收的专用通信的专用信道。 特别地,架构可以基于通信的目标受众对音频转换应用于在多方通信会话期间发生的通信。 作为说明,架构可以对私人通信应用耳语转换,基于关系,基于物理位置的氛围或空间变换或基于缺乏存在的步调变换的情感变换。