Parameter learning in a hidden trajectory model
    21.
    发明申请
    Parameter learning in a hidden trajectory model 有权
    隐藏轨迹模型中的参数学习

    公开(公告)号:US20070198260A1

    公开(公告)日:2007-08-23

    申请号:US11356898

    申请日:2006-02-17

    IPC分类号: G10L15/00

    CPC分类号: G10L15/063 G10L2015/025

    摘要: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

    摘要翻译: 使用用于观察向量的声学似然函数作为优化的反对函数来估计包括装置和方差的隐藏轨迹模型的分布参数。 该估计仅包括声学数据,而不包括对隐藏的动态变量的任何中间估计。 可以开发梯度上升方法来优化声似然函数。

    Time asynchronous decoding for long-span trajectory model
    22.
    发明申请
    Time asynchronous decoding for long-span trajectory model 失效
    用于长跨度轨迹模型的时间异步解码

    公开(公告)号:US20070143112A1

    公开(公告)日:2007-06-21

    申请号:US11311951

    申请日:2005-12-20

    IPC分类号: G10L15/18

    CPC分类号: G10L15/08 G10L15/187

    摘要: A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.

    摘要翻译: 开发了时间异步网格约束搜索算法,用于处理具有长语境跨度能力的语言语言模型。 在算法中,从模型开发的网格中的节点和链接通过预先扩展。 估计搜索算法使用的启发式算法。 此外,可以应用修剪策略来加快搜索速度。

    Learning statistically characterized resonance targets in a hidden trajectory model
    23.
    发明申请
    Learning statistically characterized resonance targets in a hidden trajectory model 有权
    在隐藏的轨迹模型中学习统计学上的共振目标

    公开(公告)号:US20070143104A1

    公开(公告)日:2007-06-21

    申请号:US11303899

    申请日:2005-12-15

    IPC分类号: G10L19/06

    摘要: A statistical trajectory speech model is constructed where the targets for vocal tract resonances are represented as random vectors and where the mean vectors of the target distributions are estimated using a likelihood function for joint acoustic observation vectors. The target mean vectors can be estimated without formant data. To form the model, time-dependent filter parameter vectors based on time-dependent coarticulation parameters are constructed that are a function of the ordering and identity of the phones in the phone sequence in each speech utterance. The filter parameter vectors are also a function of the temporal extent of coarticulation and of the speaker's speaking effort.

    摘要翻译: 构建统计轨迹语音模型,其中声道共振的目标被表示为随机向量,并且使用关联声学观测向量的似然函数来估计目标分布的平均向量。 可以不使用共振峰数据来估计目标平均向量。 为了形成模型,构建了基于时间依赖的协方差参数的随时间依赖的滤波器参数矢量,其是每个语音话语中电话序列中的电话的排序和身份的函数。 滤波器参数矢量也是协调的时间范围和说话者的说话力的函数。

    Tensor deep stacked neural network
    24.
    发明授权
    Tensor deep stacked neural network 有权
    张量深层神经网络

    公开(公告)号:US09165243B2

    公开(公告)日:2015-10-20

    申请号:US13397580

    申请日:2012-02-15

    IPC分类号: G06N3/04 G06N3/08

    CPC分类号: G06N3/04 G06N3/08

    摘要: A tensor deep stacked neural (T-DSN) network for obtaining predictions for discriminative modeling problems. The T-DSN network and method use bilinear modeling with a tensor representation to map a hidden layer to the predication layer. The T-DSN network is constructed by stacking blocks of a single hidden layer tensor neural network (SHLTNN) on top of each other. The single hidden layer for each block then is separated or divided into a plurality of two or more sections. In some embodiments, the hidden layer is separated into a first hidden layer section and a second hidden layer section. These multiple sections of the hidden layer are combined using a product operator to obtain an implicit hidden layer having a single section. In some embodiments the product operator is a Khatri-Rao product. A prediction is made using the implicit hidden layer and weights, and the output prediction layer is consequently obtained.

    摘要翻译: 张量深层次神经(T-DSN)网络,用于获得鉴别建模问题的预测。 T-DSN网络和方法使用具有张量表示的双线性建模来将隐藏层映射到预测层。 T-DSN网络由单个隐层张量神经网络(SHLTNN)的堆叠堆叠构成。 然后,每个块的单个隐藏层被分离或分成多个两个或更多个部分。 在一些实施例中,隐藏层被分成第一隐藏层部分和第二隐藏层部分。 使用产品运算符组合隐藏层的这些多个部分以获得具有单个部分的隐式隐藏层。 在一些实施例中,产品操作者是Khatri-Rao产品。 使用隐式隐层和权重进行预测,从而获得输出预测层。

    LEARNING PROCESSES FOR SINGLE HIDDEN LAYER NEURAL NETWORKS WITH LINEAR OUTPUT UNITS
    25.
    发明申请
    LEARNING PROCESSES FOR SINGLE HIDDEN LAYER NEURAL NETWORKS WITH LINEAR OUTPUT UNITS 有权
    具有线性输出单元的单一隐藏层神经网络的学习过程

    公开(公告)号:US20120303565A1

    公开(公告)日:2012-11-29

    申请号:US13113100

    申请日:2011-05-23

    申请人: Li Deng Dong Yu

    发明人: Li Deng Dong Yu

    IPC分类号: G06N3/08

    摘要: Learning processes for a single hidden layer neural network, including linear input units, nonlinear hidden units, and linear output units, calculate the lower-layer network parameter gradients by taking into consideration a solution for the upper-layer network parameters. The upper-layer network parameters are calculated by a closed form formula given the lower-layer network parameters. An accelerated gradient algorithm can be used to update the lower-layer network parameters. A weighted gradient also can be used. With the combination of these techniques, accelerated training with faster convergence, to a point with a lower error rate, can be obtained.

    摘要翻译: 通过考虑上层网络参数的解决方案,单层隐层神经网络的学习过程包括线性输入单元,非线性隐藏单元和线性输出单元,计算下层网络参数梯度。 给定较低层网络参数,上层网络参数由闭合形式公式计算。 可以使用加速梯度算法来更新下层网络参数。 也可以使用加权梯度。 通过这些技术的组合,可以获得具有较快收敛速度​​的加速训练,降低错误率的点。

    DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION
    26.
    发明申请
    DEEP CONVEX NETWORK WITH JOINT USE OF NONLINEAR RANDOM PROJECTION, RESTRICTED BOLTZMANN MACHINE AND BATCH-BASED PARALLELIZABLE OPTIMIZATION 有权
    连续使用非线性随机投影,限制性BOLTZMANN机器和基于批量的平行优化的深层网络

    公开(公告)号:US20120254086A1

    公开(公告)日:2012-10-04

    申请号:US13077978

    申请日:2011-03-31

    IPC分类号: G06N3/08

    摘要: A method is disclosed herein that includes an act of causing a processor to access a deep-structured, layered or hierarchical model, called deep convex network, retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto. This layered model can produce the output serving as the scores to combine with transition probabilities between states in a hidden Markov model and language model scores to form a full speech recognizer. The method makes joint use of nonlinear random projections and RBM weights, and it stacks a lower module's output with the raw data to establish its immediately higher module. Batch-based, convex optimization is performed to learn a portion of the deep convex network's weights, rendering it appropriate for parallel computation to accomplish the training. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

    摘要翻译: 本文公开了一种方法,其包括使处理器访问被保留在计算机可读介质中的称为深凸网络的深层结构的分层或层次模型的动作,其中深层结构模型包括多个具有 分配给它的权重。 该分层模型可以产生作为分数的输出,以与隐藏的马尔可夫模型和语言模型分数中的状态之间的转移概率相结合,以形成完整的语音识别器。 该方法联合使用非线性随机投影和RBM权重,并将较低模块的输出与原始数据叠加以建立其立即更高的模块。 执行基于批次的凸优化来学习深凸网络权重的一部分,使其适合于并行计算以完成训练。 该方法还可以包括使用基于序列而不是一组不相关帧的优化准则共同基本优化深层结构模型的权重,转移概率和语言模型分数的动作。

    Speech-centric multimodal user interface design in mobile technology
    27.
    发明授权
    Speech-centric multimodal user interface design in mobile technology 有权
    以移动技术为中心的多模态用户界面设计

    公开(公告)号:US08219406B2

    公开(公告)日:2012-07-10

    申请号:US11686722

    申请日:2007-03-15

    申请人: Dong Yu Li Deng

    发明人: Dong Yu Li Deng

    IPC分类号: G10L21/00

    摘要: A multi-modal human computer interface (HCI) receives a plurality of available information inputs concurrently, or serially, and employs a subset of the inputs to determine or infer user intent with respect to a communication or information goal. Received inputs are respectively parsed, and the parsed inputs are analyzed and optionally synthesized with respect to one or more of each other. In the event sufficient information is not available to determine user intent or goal, feedback can be provided to the user in order to facilitate clarifying, confirming, or augmenting the information inputs.

    摘要翻译: 多模式人机界面(HCI)同时或串行地接收多个可用信息输入,并且使用输入的子集来确定或推断关于通信或信息目标的用户意图。 分别对接收到的输入进行解析,并且解析输入相对于彼此中的一个或多个进行分析并任选地合成。 如果没有足够的信息来确定用户意图或目标,则可以向用户提供反馈,以便于澄清,确认或增加信息输入。

    Noise suppressor for robust speech recognition
    28.
    发明授权
    Noise suppressor for robust speech recognition 有权
    噪声抑制器用于强大的语音识别

    公开(公告)号:US08185389B2

    公开(公告)日:2012-05-22

    申请号:US12335558

    申请日:2008-12-16

    IPC分类号: G10L15/20

    CPC分类号: G10L21/0208 G10L15/20

    摘要: Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.

    摘要翻译: 描述了通常用于语音输入的噪声降低技术,其中除了信噪比(SNR)之外,基于与该帧相关联的噪声电平来确定用于帧的噪声抑制相关增益值。 在一个实现中,降噪机制基于最小均方误差,Mel-frequency cepstra降噪技术。 设置高增益值(例如一个),以在噪声电平低于阈值低电平时实现很少或没有噪声抑制,以及设置或计算的低增益值,以实现高于阈值高噪声电平的大噪声抑制。 使用噪声功率相关函数,例如对数线性插值来计算阈值之间的增益。 可以通过基于先前帧的增益值修改增益值来执行平滑化。 还描述了通过步进自适应识别学习算法在降噪中使用的学习参数。

    Deep-Structured Conditional Random Fields for Sequential Labeling and Classification
    29.
    发明申请
    Deep-Structured Conditional Random Fields for Sequential Labeling and Classification 有权
    用于顺序标记和分类的深层结构条件随机场

    公开(公告)号:US20110191274A1

    公开(公告)日:2011-08-04

    申请号:US12696051

    申请日:2010-01-29

    IPC分类号: G06F15/18 G06N5/02

    CPC分类号: G06F15/18 G06N5/02

    摘要: Described is a technology by which a deep-structured (multiple layered) conditional random field model is trained and used for classification of sequential data. Sequential data is processed at each layer, from the lowest layer to a final (highest) layer, to output data in the form of conditional probabilities of classes given the sequential input data. Each higher layer inputs the conditional probability data and the sequential data jointly to output further probability data, and so forth, until the final layer which outputs the classification data. Also described is layer-by-layer training, supervised or unsupervised. Unsupervised training may process raw features to minimize average frame-level conditional entropy while maximizing state occupation entropy, or to minimize reconstruction error. Also described is a technique for back-propagation of error information of the final layer to iteratively fine tune the parameters of the lower layers, and joint training, including joint training via subgroups of layers.

    摘要翻译: 描述了一种深层结构(多层)条件随机场模型被训练并用于顺序数据分类的技术。 在从最低层到最终(最高)层的每个层处理顺序数据,以给定顺序输入数据的类的条件概率的形式输出数据。 每个较高层输入条件概率数据和顺序数据,共同输出进一步的概率数据,等等,直到输出分类数据的最后一层。 还描述了逐层培训,监督或无监督。 无监督训练可以处理原始特征以最小化平均帧级条件熵,同时最大化状态占用熵,或最小化重建误差。 还描述了用于反向传播最终层的误差信息的技术,以迭代地微调下层的参数,以及联合训练,包括通过子层的联合训练。

    Integrated speech recognition and semantic classification
    30.
    发明授权
    Integrated speech recognition and semantic classification 有权
    综合语音识别和语义分类

    公开(公告)号:US07856351B2

    公开(公告)日:2010-12-21

    申请号:US11655703

    申请日:2007-01-19

    IPC分类号: G06F17/27

    CPC分类号: G10L15/1815

    摘要: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

    摘要翻译: 一种新颖的系统集成了语音识别和语义分类,从而在训练语言模型和语义分类模型时,可以考虑接受讲话语音的语音识别器中的声学分数。 例如,可以定义联合关联分数,其表示声学信号的语义类别和单词序列的对应关系。 联合关联分数可以包括参数,例如声信号的信号到类建模的加权参数,语言模型参数和分数,以及声学模型参数和分数。 可以修改参数以相对于具有目标语义类的竞争者词序列的联合关联分数来提高具有目标语义类别的目标词序列的联合关联分数。 可以设计参数,使得训练数据中的语义分类误差最小化。