专利检索 ap:("INTERNATIONAL BUSINESS MACHINES CORPORATION") AND inv:"Takashi Fukuda" 第 4 页

31.

发明授权
Training teacher machine learning models using lossless and lossy branches 有权

公开(公告)号：US11907845B2

公开(公告)日：2024-02-20

申请号：US16994656

申请日：2020-08-17

申请人： International Business Machines Corporation

发明人： Takashi Fukuda , Samuel Thomas

IPC分类号： G06N3/084 , G10L15/16 , G06N3/045

CPC分类号： G06N3/084 , G06N3/045 , G10L15/16

摘要： Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.

32.

发明公开
VOICE ACTIVITY DETECTION INTEGRATION TO IMPROVE AUTOMATIC SPEECH DETECTION 审中-公开

公开(公告)号：US20240038221A1

公开(公告)日：2024-02-01

申请号：US17815798

申请日：2022-07-28

申请人： International Business Machines Corporation

发明人： Sashi Novitasari , Takashi Fukuda , Gakuto Kurata

IPC分类号： G10L15/16 , G10L25/78 , G10L15/22 , G10L15/06 , G10L15/20

CPC分类号： G10L15/16 , G10L25/78 , G10L15/22 , G10L15/063 , G10L15/20

摘要： Systems, computer-implemented methods, and computer program products to facilitate multi-task training a recurrent neural network transducer (RNN-T) using automatic speech recognition (ASR) information are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include an RNN-T that can receive ASR information. The computer executable components can include a voice activity detection (VAD) model that trains the RNN-T using the ASR information, where the RNN-T can further comprise an encoder and a joint network. One or more outputs of the encoder can be integrated with the joint network and one or more outputs of the VAD model.

33.

发明公开
DATA SORTING FOR GENERATING RNN-T MODELS 审中-公开

公开(公告)号：US20230237987A1

公开(公告)日：2023-07-27

申请号：US17580846

申请日：2022-01-21

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Tohru Nagano

IPC分类号： G10L15/02 , G10L15/06 , G06F7/24

CPC分类号： G10L15/02 , G10L15/063 , G06F7/24 , G10L2015/025

摘要： A computer-implemented method for preparing training data for a speech recognition model is provided including obtaining a plurality of sentences from a corpus, dividing each phoneme in each sentence of the plurality of sentences into three hidden states, calculating, for each sentence of the plurality of sentences, a score based on a variation in duration of the three hidden states of each phoneme in the sentence, and sorting the plurality of sentences by using the calculated scores.

34.

发明公开
GLOBAL NEURAL TRANSDUCER MODELS LEVERAGING SUB-TASK NETWORKS 审中-公开

公开(公告)号：US20230153601A1

公开(公告)日：2023-05-18

申请号：US17526350

申请日：2021-11-15

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Samuel Thomas

IPC分类号： G06N3/08 , G06N3/04 , G10L15/00

CPC分类号： G06N3/08 , G06N3/0454 , G10L15/00

摘要： A computer-implemented method for training a neural transducer for speech recognition is provided. The method includes initializing the neural transducer having a prediction network and an encoder network and a joint network. The method further includes expanding the prediction network by changing the prediction network to a plurality of prediction-net branches. Each of the prediction-net branches is a prediction network for a respective specific sub-task from among a plurality of specific sub-tasks. The method also includes training, by a hardware processor, an entirety of the neural transducer by using training data sets for all of the plurality of specific sub-tasks. The method additionally includes obtaining a trained neural transducer by fusing the plurality of prediction-net branches.

35.

发明授权
Teacher and student learning for constructing mixed-domain model 有权

公开(公告)号：US11416741B2

公开(公告)日：2022-08-16

申请号：US16003790

申请日：2018-06-08

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Osamu Ichikawa , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G10L15/06 , G06N3/08 , G10L15/02

摘要： A technique for constructing a model supporting a plurality of domains is disclosed. In the technique, a plurality of teacher models, each of which is specialized for different one of the plurality of the domains, is prepared. A plurality of training data collections, each of which is collected for different one of the plurality of the domains, is obtained. A plurality of soft label sets is generated by inputting each training data in the plurality of the training data collections into corresponding one of the plurality of the teacher models. A student model is trained using the plurality of the soft label sets.

36.

发明授权
Soft label generation for knowledge distillation 有权

公开(公告)号：US11410029B2

公开(公告)日：2022-08-09

申请号：US15860097

申请日：2018-01-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G06N3/08

摘要： A technique for generating soft labels for training is disclosed. A teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Class pairs includes classes labelled to corresponding data units from the teacher side class set and a student side class set different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.

37.

发明授权
Data augmentation by frame insertion for speech data 有权

公开(公告)号：US11227579B2

公开(公告)日：2022-01-18

申请号：US16535829

申请日：2019-08-08

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Toru Nagano , Takashi Fukuda , Masayuki Suzuki , Gakuto Kurata

IPC分类号： G10L13/033 , G10L15/18 , G06F40/205 , G06F40/284

摘要： A technique for data augmentation for speech data is disclosed. Original speech data including a sequence of feature frames is obtained. A partially prolonged copy of the original speech data is generated by inserting one or more new frames into the sequence of the feature frames. The partially prolonged copy is output as augmented speech data for training an acoustic model for training an acoustic model.

38.

发明授权
Detection of music segment in audio signal 有权

公开(公告)号：US11037583B2

公开(公告)日：2021-06-15

申请号：US16116042

申请日：2018-08-29

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Masayuki Suzuki , Takashi Fukuda , Toru Nagano

IPC分类号： G10L25/81 , G10L25/21

摘要： A technique for detecting a music segment in an audio signal is disclosed. A time window is set for each section in an audio signal. A maximum and a statistic of the audio signal within the time window are calculated. A density index is computed for the section using the maximum and the statistic. The density index is a measure of the statistic relative to the maximum. The section is estimated as a music segment based, at least in part, on a condition with respect to the density index.

39.

发明授权
Training of front-end and back-end neural networks 有权

公开(公告)号：US11003983B2

公开(公告)日：2021-05-11

申请号：US16670201

申请日：2019-10-31

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda

IPC分类号： G06N3/04 , G06N3/063 , G10L21/0232 , G10L15/20 , G10L21/0208 , G06N3/08 , G10L15/16

摘要： A computer-implemented method for training a front-end neural network (“front-end NN”) and a back-end neural network (“back-end NN”) is provided. The method includes combining the back-end neural network with the front-end neural network to form a joint layer to thereby generate a combined neural network. The method also includes training the combined neural network for a speech recognition with a set of utterances as training data, with the joint layer having a plurality of frames and each frame having a plurality of bins, and where one or more specific units in each frame are dropped during the training, each of the specific units being selected randomly or based on a bin number to which the respective unit is set within its frame, with the specific units corresponding to one or more common frequency bands.

40.

发明授权
Transfer of an acoustic knowledge to a neural network 有权

公开(公告)号：US10832129B2

公开(公告)日：2020-11-10

申请号：US15288515

申请日：2016-10-07

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Masayuki A. Suzuki , Ryuki Tachibana

IPC分类号： G06N3/08 , G10L15/06 , G10L15/16 , G06N7/00

摘要： A method for transferring acoustic knowledge of a trained acoustic model (AM) to a neural network (NN) includes reading, into memory, the NN and the AM, the AM being trained with target domain data, and a set of training data including a set of phoneme data, the set of training data being data obtained from a domain different from a target domain for the target domain data, inputting training data from the set of training data into the AM, calculating one or more posterior probabilities of context-dependent states corresponding to phonemes in a phoneme class of a phoneme to which each frame in the training data belongs, and generating a posterior probability vector from the one or more posterior probabilities, as a soft label for the NN, and inputting the training data into the NN and updating the NN, using the soft label.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类