专利检索 ap:("International Business Machines Corporation") AND inv:"Takashi Fukuda" 第 3 页

21.

发明申请
NEURAL NETWORK-BASED ACOUSTIC MODEL WITH SOFTENING TARGET-LAYER 审中-公开

公开(公告)号：US20200005769A1

公开(公告)日：2020-01-02

申请号：US16019676

申请日：2018-06-27

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Osamu Ichikawa , Takashi Fukuda

IPC分类号： G10L15/16 , G10L15/06 , G06N3/08

摘要： A method is provided for training a neural network-based (NN-based) acoustic model. The method includes receiving, by a processor, the neural network-based (NN-based) acoustic model, trained by a one-hot scheme and having an input layer, a set of middle layers, and an original output layer. At least each of the middle layers subsequent to a first one of the middle layers have trained parameters. The method further includes stacking, by the processor, a new output layer on the original output layer of the NN-based acoustic model to form a new NN-based acoustic model. The new output layer has a same size as the original output layer. The method also includes retraining, by the processor, only the new output layer and the original output layer of the new NN-based acoustic model in the one-hot scheme, with the trained parameters of middle layers subsequent to at least the first one being fixed.

22.

发明授权
Testing words in a pronunciation lexicon 有权

公开(公告)号：US10373607B2

公开(公告)日：2019-08-06

申请号：US15621778

申请日：2017-06-13

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Osamu Ichikawa , Futoshi Iwama

IPC分类号： G10L13/08 , G10L15/01 , G10L15/193

摘要： A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

23.

发明申请
ACOUSTIC CHANGE DETECTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION 审中-公开

公开(公告)号：US20190206394A1

公开(公告)日：2019-07-04

申请号：US15861037

申请日：2018-01-03

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Osamu Ichikawa , Gakuto Kurata , Takashi Fukuda

IPC分类号： G10L15/20 , G10L15/06 , G10L15/07 , G10L15/02

CPC分类号： G10L15/20 , G10L15/02 , G10L15/063 , G10L15/075 , G10L25/24 , G10L25/27 , G10L25/51

摘要： Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.

24.

发明申请
SOFT LABEL GENERATION FOR KNOWLEDGE DISTILLATION 审中-公开

公开(公告)号：US20190205748A1

公开(公告)日：2019-07-04

申请号：US15860097

申请日：2018-01-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G06N3/08

CPC分类号： G06N3/08

摘要： A technique for generating soft labels for training is disclosed. In the method, a teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Each class pair includes classes labelled to a corresponding data unit from among the teacher side class set and from among a student side class set that is different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.

25.

发明授权
Sound identification utilizing periodic indications 有权

公开(公告)号：US10062378B1

公开(公告)日：2018-08-28

申请号：US15441973

申请日：2017-02-24

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Osamu Ichikawa , Bhuvana Ramabhadran

IPC分类号： G10L15/00 , G10L15/16 , G10L15/02 , G10L21/038 , G10L15/06 , G10L25/24

摘要： A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.

26.

发明申请
TRAINING OF FRONT-END AND BACK-END NEURAL NETWORKS 审中-公开

公开(公告)号：US20180053087A1

公开(公告)日：2018-02-22

申请号：US15240613

申请日：2016-08-18

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda

IPC分类号： G06N3/08 , G06N3/04

CPC分类号： G06N3/0454 , G06N3/063 , G06N3/084 , G10L15/16 , G10L15/20 , G10L21/0208 , G10L21/0232

摘要： Methods, systems, and computer programs are provided for training a front-end neural network (“front-end NN”) and a back-end neural network (“back-end NN”). The method includes: combining the back-end NN with the front-end NN so that an output layer of the front-end NN is also an input layer of the back-end NN to form a joint layer to thereby generate a combined NN; and training the combined NN for a speech recognition with a set of utterances as training data, a plurality of specific units in the joint layer being dropped during the training and the plurality of the specific units corresponding to one or more common frequency bands. The front-end NN may be configured to estimate clean frequency filter bank features from noisy input features; or, to estimate clean frequency filter bank features from noisy frequency filter bank input features in the same feature space.

27.

发明授权
Testing words in a pronunciation lexicon 有权

公开(公告)号：US09734821B2

公开(公告)日：2017-08-15

申请号：US14755854

申请日：2015-06-30

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Osamu Ichikawa , Futoshi Iwama

IPC分类号： G06F17/27 , G06F17/20 , G10L15/01 , G10L15/193 , G10L13/08

CPC分类号： G10L15/01 , G10L13/08 , G10L15/193

摘要： A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

28.

发明申请
DISCRIMINATIVE TRAINING OF A FEATURE-SPACE TRANSFORM 审中-公开

公开(公告)号：US20170213543A1

公开(公告)日：2017-07-27

申请号：US15004413

申请日：2016-01-22

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda

IPC分类号： G10L15/06 , G10L15/183

CPC分类号： G10L15/063 , G10L15/183 , G10L2015/0635

摘要： A method, a system, and a computer program product are provided for discriminatively training a feature-space transform. The method includes performing feature-space discriminative training (f-DT) on an initialized feature-space transform, using manually transcribed data, to obtain a pre-stage trained feature-space transform. The method further includes performing f-DT on the pre-stage trained feature-space transform as a newly initialized feature-space transform, using automatically transcribed data, to obtain a main-stage trained feature-space transform. The method additionally includes performing f-DT on the main-stage trained feature-space transform as a newly initialized feature-space transform, using manually transcribed data, to obtain a post-stage trained feature-space transform.

29.

发明公开
INSERTION ERROR REDUCTION WITH CONFIDENCE SCORE-BASED WORD FILTERING 审中-公开

公开(公告)号：US20240331687A1

公开(公告)日：2024-10-03

申请号：US18129030

申请日：2023-03-30

申请人： International Business Machines Corporation

发明人： Takashi Fukuda , George Andrei Saon

IPC分类号： G10L15/19 , G10L15/22

CPC分类号： G10L15/19 , G10L15/22

摘要： A word-level confidence score is calculated using a computerized automatic speech recognition system by computing an average of confidence levels for each character in a word and a trailing space character delineating an end of the word and the word is managed using the computerized automatic speech recognition system and using a threshold process based on the calculated word-level confidence score.

30.

发明公开
Reducing Insertion Errors in Neural Transducer-Based Automatic Speech Recognition 审中-公开

公开(公告)号：US20240242707A1

公开(公告)日：2024-07-18

申请号：US18096308

申请日：2023-01-12

申请人： International Business Machines Corporation

发明人： Takashi Fukuda

IPC分类号： G10L15/06 , G10L25/78 , G10L25/93

CPC分类号： G10L15/063 , G10L25/78 , G10L25/93 , G10L2025/783

摘要： Techniques for training a neural transducer-based automatic speech recognition model to be robust against background additive noise and thereby reducing insertion errors. In one aspect, a method of training an automatic speech recognition model includes: generating a modified training data set from an initial training dataset by concatenating one-word utterances with a preceding or a succeeding sentence in the initial training dataset based on a duration of silence between the one-word utterances and the preceding or the succeeding sentence; and training the automatic speech recognition model using the modified training data set.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类