Generation of voice data as data augmentation for acoustic model training

    公开(公告)号:US10726828B2

    公开(公告)日:2020-07-28

    申请号:US15609665

    申请日:2017-05-31

    摘要: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.

    Sound identification utilizing periodic indications

    公开(公告)号:US10460723B2

    公开(公告)日:2019-10-29

    申请号:US15992778

    申请日:2018-05-30

    摘要: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method includes estimating sound identification information from a neural network having periodic indications and components of a frequency spectrum of an audio signal data inputted thereto. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.

    SOUND IDENTIFICATION UTILIZING PERIODIC INDICATIONS

    公开(公告)号:US20180277104A1

    公开(公告)日:2018-09-27

    申请号:US15992778

    申请日:2018-05-30

    摘要: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method includes estimating sound identification information from a neural network having periodic indications and components of a frequency spectrum of an audio signal data inputted thereto. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.

    SOUND IDENTIFICATION UTILIZING PERIODIC INDICATIONS

    公开(公告)号:US20200058297A1

    公开(公告)日:2020-02-20

    申请号:US16665159

    申请日:2019-10-28

    摘要: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information from a neural network having periodic indications and components of a frequency spectrum of the audio signal data inputted thereto. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.

    NEURAL NETWORK-BASED ACOUSTIC MODEL WITH SOFTENING TARGET-LAYER

    公开(公告)号:US20200005769A1

    公开(公告)日:2020-01-02

    申请号:US16019676

    申请日:2018-06-27

    IPC分类号: G10L15/16 G10L15/06 G06N3/08

    摘要: A method is provided for training a neural network-based (NN-based) acoustic model. The method includes receiving, by a processor, the neural network-based (NN-based) acoustic model, trained by a one-hot scheme and having an input layer, a set of middle layers, and an original output layer. At least each of the middle layers subsequent to a first one of the middle layers have trained parameters. The method further includes stacking, by the processor, a new output layer on the original output layer of the NN-based acoustic model to form a new NN-based acoustic model. The new output layer has a same size as the original output layer. The method also includes retraining, by the processor, only the new output layer and the original output layer of the new NN-based acoustic model in the one-hot scheme, with the trained parameters of middle layers subsequent to at least the first one being fixed.

    Testing words in a pronunciation lexicon

    公开(公告)号:US10373607B2

    公开(公告)日:2019-08-06

    申请号:US15621778

    申请日:2017-06-13

    摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.