Customization of recurrent neural network transducers for speech recognition

    公开(公告)号:US11908458B2

    公开(公告)日:2024-02-20

    申请号:US17136439

    申请日:2020-12-29

    摘要: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

    TRANSLITERATION BASED DATA AUGMENTATION FOR TRAINING MULTILINGUAL ASR ACOUSTIC MODELS IN LOW RESOURCE SETTINGS

    公开(公告)号:US20220122585A1

    公开(公告)日:2022-04-21

    申请号:US17073337

    申请日:2020-10-17

    IPC分类号: G10L15/06 G10L15/16

    摘要: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

    DATA AUGMENTATION METHOD BASED ON STOCHASTIC FEATURE MAPPING FOR AUTOMATIC SPEECH RECOGNITION
    7.
    发明申请
    DATA AUGMENTATION METHOD BASED ON STOCHASTIC FEATURE MAPPING FOR AUTOMATIC SPEECH RECOGNITION 有权
    基于自动语音识别的定位特征映射的数据补偿方法

    公开(公告)号:US20170040016A1

    公开(公告)日:2017-02-09

    申请号:US14689730

    申请日:2015-04-17

    IPC分类号: G10L15/06 G10L15/02 G10L15/16

    摘要: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

    摘要翻译: 一种增强训练数据的方法包括:将来自誊本内的多个话语确定的源扬声器的特征序列转换成在相同抄本下的目标说话者的特征序列,训练用于目标说话者的与扬声器相关的声学模型,以对应于 讲话者专有的声学特性,估计源扬声器的特征序列与目标扬声器的与扬声器相关的声学模型之间的映射函数,以及使用映射函数将来自训练集中的每个说话者的每个发声器映射到多个选定的目标扬声器 在训练集中。

    Input encoding for classifier generalization

    公开(公告)号:US11914678B2

    公开(公告)日:2024-02-27

    申请号:US17030156

    申请日:2020-09-23

    IPC分类号: G06F18/241 G06N3/08 H03M7/30

    摘要: Techniques for classifier generalization in a supervised learning process using input encoding are provided. In one aspect, a method for classification generalization includes: encoding original input features from at least one input sample {right arrow over (x)}S with a uniquely decodable code using an encoder E(⋅) to produce encoded input features E({right arrow over (x)}S), wherein the at least one input sample {right arrow over (x)}S comprises uncoded input features; feeding the uncoded input features and the encoded input features E({right arrow over (x)}S) to a base model to build an encoded model; and learning a classification function {tilde over (C)}E(⋅) using the encoded model, wherein the classification function {tilde over (C)}E(⋅) learned using the encoded model is more general than that learned using the uncoded input features alone.