Training teacher machine learning models using lossless and lossy branches

    公开(公告)号:US11907845B2

    公开(公告)日:2024-02-20

    申请号:US16994656

    申请日:2020-08-17

    IPC分类号: G06N3/084 G10L15/16 G06N3/045

    CPC分类号: G06N3/084 G06N3/045 G10L15/16

    摘要: Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.

    GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS

    公开(公告)号:US20230153601A1

    公开(公告)日:2023-05-18

    申请号:US17526350

    申请日:2021-11-15

    IPC分类号: G06N3/08 G06N3/04 G10L15/00

    CPC分类号: G06N3/08 G06N3/0454 G10L15/00

    摘要: A computer-implemented method for training a neural transducer for speech recognition is provided. The method includes initializing the neural transducer having a prediction network and an encoder network and a joint network. The method further includes expanding the prediction network by changing the prediction network to a plurality of prediction-net branches. Each of the prediction-net branches is a prediction network for a respective specific sub-task from among a plurality of specific sub-tasks. The method also includes training, by a hardware processor, an entirety of the neural transducer by using training data sets for all of the plurality of specific sub-tasks. The method additionally includes obtaining a trained neural transducer by fusing the plurality of prediction-net branches.

    Soft label generation for knowledge distillation

    公开(公告)号:US11410029B2

    公开(公告)日:2022-08-09

    申请号:US15860097

    申请日:2018-01-02

    IPC分类号: G06N3/08

    摘要: A technique for generating soft labels for training is disclosed. A teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Class pairs includes classes labelled to corresponding data units from the teacher side class set and a student side class set different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.

    Detection of music segment in audio signal

    公开(公告)号:US11037583B2

    公开(公告)日:2021-06-15

    申请号:US16116042

    申请日:2018-08-29

    IPC分类号: G10L25/81 G10L25/21

    摘要: A technique for detecting a music segment in an audio signal is disclosed. A time window is set for each section in an audio signal. A maximum and a statistic of the audio signal within the time window are calculated. A density index is computed for the section using the maximum and the statistic. The density index is a measure of the statistic relative to the maximum. The section is estimated as a music segment based, at least in part, on a condition with respect to the density index.

    Training of front-end and back-end neural networks

    公开(公告)号:US11003983B2

    公开(公告)日:2021-05-11

    申请号:US16670201

    申请日:2019-10-31

    发明人: Takashi Fukuda

    摘要: A computer-implemented method for training a front-end neural network (“front-end NN”) and a back-end neural network (“back-end NN”) is provided. The method includes combining the back-end neural network with the front-end neural network to form a joint layer to thereby generate a combined neural network. The method also includes training the combined neural network for a speech recognition with a set of utterances as training data, with the joint layer having a plurality of frames and each frame having a plurality of bins, and where one or more specific units in each frame are dropped during the training, each of the specific units being selected randomly or based on a bin number to which the respective unit is set within its frame, with the specific units corresponding to one or more common frequency bands.

    Transfer of an acoustic knowledge to a neural network

    公开(公告)号:US10832129B2

    公开(公告)日:2020-11-10

    申请号:US15288515

    申请日:2016-10-07

    摘要: A method for transferring acoustic knowledge of a trained acoustic model (AM) to a neural network (NN) includes reading, into memory, the NN and the AM, the AM being trained with target domain data, and a set of training data including a set of phoneme data, the set of training data being data obtained from a domain different from a target domain for the target domain data, inputting training data from the set of training data into the AM, calculating one or more posterior probabilities of context-dependent states corresponding to phonemes in a phoneme class of a phoneme to which each frame in the training data belongs, and generating a posterior probability vector from the one or more posterior probabilities, as a soft label for the NN, and inputting the training data into the NN and updating the NN, using the soft label.