NEURAL NETWORK-BASED ACOUSTIC MODEL WITH SOFTENING TARGET-LAYER

    公开(公告)号:US20200005769A1

    公开(公告)日:2020-01-02

    申请号:US16019676

    申请日:2018-06-27

    IPC分类号: G10L15/16 G10L15/06 G06N3/08

    摘要: A method is provided for training a neural network-based (NN-based) acoustic model. The method includes receiving, by a processor, the neural network-based (NN-based) acoustic model, trained by a one-hot scheme and having an input layer, a set of middle layers, and an original output layer. At least each of the middle layers subsequent to a first one of the middle layers have trained parameters. The method further includes stacking, by the processor, a new output layer on the original output layer of the NN-based acoustic model to form a new NN-based acoustic model. The new output layer has a same size as the original output layer. The method also includes retraining, by the processor, only the new output layer and the original output layer of the new NN-based acoustic model in the one-hot scheme, with the trained parameters of middle layers subsequent to at least the first one being fixed.

    Testing words in a pronunciation lexicon

    公开(公告)号:US10373607B2

    公开(公告)日:2019-08-06

    申请号:US15621778

    申请日:2017-06-13

    摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

    SOFT LABEL GENERATION FOR KNOWLEDGE DISTILLATION

    公开(公告)号:US20190205748A1

    公开(公告)日:2019-07-04

    申请号:US15860097

    申请日:2018-01-02

    IPC分类号: G06N3/08

    CPC分类号: G06N3/08

    摘要: A technique for generating soft labels for training is disclosed. In the method, a teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Each class pair includes classes labelled to a corresponding data unit from among the teacher side class set and from among a student side class set that is different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.

    TRAINING OF FRONT-END AND BACK-END NEURAL NETWORKS

    公开(公告)号:US20180053087A1

    公开(公告)日:2018-02-22

    申请号:US15240613

    申请日:2016-08-18

    发明人: Takashi Fukuda

    IPC分类号: G06N3/08 G06N3/04

    摘要: Methods, systems, and computer programs are provided for training a front-end neural network (“front-end NN”) and a back-end neural network (“back-end NN”). The method includes: combining the back-end NN with the front-end NN so that an output layer of the front-end NN is also an input layer of the back-end NN to form a joint layer to thereby generate a combined NN; and training the combined NN for a speech recognition with a set of utterances as training data, a plurality of specific units in the joint layer being dropped during the training and the plurality of the specific units corresponding to one or more common frequency bands. The front-end NN may be configured to estimate clean frequency filter bank features from noisy input features; or, to estimate clean frequency filter bank features from noisy frequency filter bank input features in the same feature space.

    Testing words in a pronunciation lexicon

    公开(公告)号:US09734821B2

    公开(公告)日:2017-08-15

    申请号:US14755854

    申请日:2015-06-30

    摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

    DISCRIMINATIVE TRAINING OF A FEATURE-SPACE TRANSFORM

    公开(公告)号:US20170213543A1

    公开(公告)日:2017-07-27

    申请号:US15004413

    申请日:2016-01-22

    发明人: Takashi Fukuda

    IPC分类号: G10L15/06 G10L15/183

    摘要: A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.