Information processing device, large vocabulary continuous speech recognition method and program including hypothesis ranking
    52.
    发明授权
    Information processing device, large vocabulary continuous speech recognition method and program including hypothesis ranking 有权
    信息处理设备,大词汇连续语音识别方法和程序包括假设排名

    公开(公告)号:US09165553B2

    公开(公告)日:2015-10-20

    申请号:US13744963

    申请日:2013-01-18

    IPC分类号: G10L15/04 G10L15/10 G10L15/02

    摘要: System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.

    摘要翻译: 用于大词汇连续语音的声学不变结构执行语音识别的系统和方法。 信息处理装置接收声音作为输入并执行语音识别。 信息处理装置包括:语音识别处理单元,用于输出语音识别分数;结构分数计算单元,用于计算结构分数,该结构分数是关于涉及包含该假设的所有音素对的每个假设的分数 通过对音素对对分配距离可能性应用音节对对加权,然后执行求和,以及基于语音识别得分和结构得分的和值对多个假设进行排名的排序单元。

    CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION

    公开(公告)号:US20150051899A1

    公开(公告)日:2015-02-19

    申请号:US13965492

    申请日:2013-08-13

    IPC分类号: G06F17/27

    CPC分类号: G06F17/27

    摘要: Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts.

    Customization of recurrent neural network transducers for speech recognition

    公开(公告)号:US11908458B2

    公开(公告)日:2024-02-20

    申请号:US17136439

    申请日:2020-12-29

    摘要: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

    KNOWLEDGE TRANSFER BETWEEN RECURRENT NEURAL NETWORKS

    公开(公告)号:US20230196107A1

    公开(公告)日:2023-06-22

    申请号:US18168794

    申请日:2023-02-14

    IPC分类号: G06N3/08 G06N3/044 G06N3/045

    CPC分类号: G06N3/08 G06N3/044 G06N3/045

    摘要: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

    Adaptation of model for recognition processing

    公开(公告)号:US11443169B2

    公开(公告)日:2022-09-13

    申请号:US15048318

    申请日:2016-02-19

    发明人: Gakuto Kurata

    IPC分类号: G06N3/04

    摘要: A computer implemented method for adapting a model for recognition processing to a target-domain is disclosed. The method includes preparing a first distribution in relation to a part of the model, in which the first distribution is derived from data of a training-domain for the model. The method also includes obtaining a second distribution in relation to the part of the model by using data of the target-domain. The method further includes tuning one or more parameters of the part of the model so that difference between the first and the second distributions becomes small.

    MULTI-STEP LINEAR INTERPOLATION OF LANGUAGE MODELS

    公开(公告)号:US20220254335A1

    公开(公告)日:2022-08-11

    申请号:US17168982

    申请日:2021-02-05

    摘要: A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

    LEARNING UNPAIRED MULTIMODAL FEATURE MATCHING FOR SEMI-SUPERVISED LEARNING

    公开(公告)号:US20220172080A1

    公开(公告)日:2022-06-02

    申请号:US17109550

    申请日:2020-12-02

    IPC分类号: G06N5/04 G06N20/00

    摘要: A computer-implemented method is provided for learning multimodal feature matching. The method includes training an image encoder to obtain encoded images. The method further includes training a common classifier on the encoded images by using labeled images. The method also includes training a text encoder while keeping the common classifier in a fixed configuration by using learned text embeddings and corresponding labels for the learned text embeddings. The text encoder is further trained to match a distance of predicted text embeddings which is encoded by the text encoder to a fitted Gaussian distribution on the encoded images.

    Aligning spike timing of models for maching learning

    公开(公告)号:US11302309B2

    公开(公告)日:2022-04-12

    申请号:US16570022

    申请日:2019-09-13

    摘要: A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.