Customization of recurrent neural network transducers for speech recognition

    公开(公告)号:US11908458B2

    公开(公告)日:2024-02-20

    申请号:US17136439

    申请日:2020-12-29

    摘要: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

    ACCURACY OF STREAMING RNN TRANSDUCER

    公开(公告)号:US20220093083A1

    公开(公告)日:2022-03-24

    申请号:US17031345

    申请日:2020-09-24

    摘要: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

    MULTIPLICATIVE INTEGRATION IN NEURAL NETWORK TRANSDUCER MODELS FOR END-TO-END SPEECH RECOGNITION

    公开(公告)号:US20220059082A1

    公开(公告)日:2022-02-24

    申请号:US16999405

    申请日:2020-08-21

    摘要: Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.

    SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20210065680A1

    公开(公告)日:2021-03-04

    申请号:US16551915

    申请日:2019-08-27

    IPC分类号: G10L15/06 G10L15/16 G10L15/05

    摘要: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.