-
公开(公告)号:US20200005769A1
公开(公告)日:2020-01-02
申请号:US16019676
申请日:2018-06-27
发明人: Osamu Ichikawa , Takashi Fukuda
摘要: A method is provided for training a neural network-based (NN-based) acoustic model. The method includes receiving, by a processor, the neural network-based (NN-based) acoustic model, trained by a one-hot scheme and having an input layer, a set of middle layers, and an original output layer. At least each of the middle layers subsequent to a first one of the middle layers have trained parameters. The method further includes stacking, by the processor, a new output layer on the original output layer of the NN-based acoustic model to form a new NN-based acoustic model. The new output layer has a same size as the original output layer. The method also includes retraining, by the processor, only the new output layer and the original output layer of the new NN-based acoustic model in the one-hot scheme, with the trained parameters of middle layers subsequent to at least the first one being fixed.
-
公开(公告)号:US10373607B2
公开(公告)日:2019-08-06
申请号:US15621778
申请日:2017-06-13
发明人: Takashi Fukuda , Osamu Ichikawa , Futoshi Iwama
IPC分类号: G10L13/08 , G10L15/01 , G10L15/193
摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
-
公开(公告)号:US20190206394A1
公开(公告)日:2019-07-04
申请号:US15861037
申请日:2018-01-03
发明人: Osamu Ichikawa , Gakuto Kurata , Takashi Fukuda
CPC分类号: G10L15/20 , G10L15/02 , G10L15/063 , G10L15/075 , G10L25/24 , G10L25/27 , G10L25/51
摘要: Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.
-
公开(公告)号:US20190205748A1
公开(公告)日:2019-07-04
申请号:US15860097
申请日:2018-01-02
IPC分类号: G06N3/08
CPC分类号: G06N3/08
摘要: A technique for generating soft labels for training is disclosed. In the method, a teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Each class pair includes classes labelled to a corresponding data unit from among the teacher side class set and from among a student side class set that is different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.
-
公开(公告)号:US10062378B1
公开(公告)日:2018-08-28
申请号:US15441973
申请日:2017-02-24
摘要: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.
-
公开(公告)号:US20180053087A1
公开(公告)日:2018-02-22
申请号:US15240613
申请日:2016-08-18
发明人: Takashi Fukuda
CPC分类号: G06N3/0454 , G06N3/063 , G06N3/084 , G10L15/16 , G10L15/20 , G10L21/0208 , G10L21/0232
摘要: Methods, systems, and computer programs are provided for training a front-end neural network (“front-end NN”) and a back-end neural network (“back-end NN”). The method includes: combining the back-end NN with the front-end NN so that an output layer of the front-end NN is also an input layer of the back-end NN to form a joint layer to thereby generate a combined NN; and training the combined NN for a speech recognition with a set of utterances as training data, a plurality of specific units in the joint layer being dropped during the training and the plurality of the specific units corresponding to one or more common frequency bands. The front-end NN may be configured to estimate clean frequency filter bank features from noisy input features; or, to estimate clean frequency filter bank features from noisy frequency filter bank input features in the same feature space.
-
公开(公告)号:US09734821B2
公开(公告)日:2017-08-15
申请号:US14755854
申请日:2015-06-30
发明人: Takashi Fukuda , Osamu Ichikawa , Futoshi Iwama
IPC分类号: G06F17/27 , G06F17/20 , G10L15/01 , G10L15/193 , G10L13/08
CPC分类号: G10L15/01 , G10L13/08 , G10L15/193
摘要: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
-
公开(公告)号:US20170213543A1
公开(公告)日:2017-07-27
申请号:US15004413
申请日:2016-01-22
发明人: Takashi Fukuda
IPC分类号: G10L15/06 , G10L15/183
CPC分类号: G10L15/063 , G10L15/183 , G10L2015/0635
摘要: A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.
-
公开(公告)号:US20240331687A1
公开(公告)日:2024-10-03
申请号:US18129030
申请日:2023-03-30
发明人: Takashi Fukuda , George Andrei Saon
摘要: A word-level confidence score is calculated using a computerized automatic speech recognition system by computing an average of confidence levels for each character in a word and a trailing space character delineating an end of the word and the word is managed using the computerized automatic speech recognition system and using a threshold process based on the calculated word-level confidence score.
-
公开(公告)号:US20240242707A1
公开(公告)日:2024-07-18
申请号:US18096308
申请日:2023-01-12
发明人: Takashi Fukuda
CPC分类号: G10L15/063 , G10L25/78 , G10L25/93 , G10L2025/783
摘要: Techniques for training a neural transducer-based automatic speech recognition model to be robust against background additive noise and thereby reducing insertion errors. In one aspect, a method of training an automatic speech recognition model includes: generating a modified training data set from an initial training dataset by concatenating one-word utterances with a preceding or a succeeding sentence in the initial training dataset based on a duration of silence between the one-word utterances and the preceding or the succeeding sentence; and training the automatic speech recognition model using the modified training data set.
-
-
-
-
-
-
-
-
-