专利检索 ap:("International Business Machines Corporation") AND inv:"George Andrei Saon" 第 1 页

1.

发明公开
INSERTION ERROR REDUCTION WITH CONFIDENCE SCORE-BASED WORD FILTERING 审中-公开

公开(公告)号：US20240331687A1

公开(公告)日：2024-10-03

申请号：US18129030

申请日：2023-03-30

申请人： International Business Machines Corporation

发明人： Takashi Fukuda , George Andrei Saon

IPC分类号： G10L15/19 , G10L15/22

CPC分类号： G10L15/19 , G10L15/22

摘要： A word-level confidence score is calculated using a computerized automatic speech recognition system by computing an average of confidence levels for each character in a word and a trailing space character delineating an end of the word and the word is managed using the computerized automatic speech recognition system and using a threshold process based on the calculated word-level confidence score.

2.

发明授权
Customization of recurrent neural network transducers for speech recognition 有权

公开(公告)号：US11908458B2

公开(公告)日：2024-02-20

申请号：US17136439

申请日：2020-12-29

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , George Andrei Saon , Brian E. D. Kingsbury

IPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

CPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

摘要： A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

3.

发明申请
ACCURACY OF STREAMING RNN TRANSDUCER 有权

公开(公告)号：US20220093083A1

公开(公告)日：2022-03-24

申请号：US17031345

申请日：2020-09-24

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , George Andrei Saon

IPC分类号： G10L15/16 , G10L25/30 , G06N3/08 , G06N3/04 , G06K9/62 , G06F17/18

摘要： A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

4.

发明申请
MULTIPLICATIVE INTEGRATION IN NEURAL NETWORK TRANSDUCER MODELS FOR END-TO-END SPEECH RECOGNITION 有权

公开(公告)号：US20220059082A1

公开(公告)日：2022-02-24

申请号：US16999405

申请日：2020-08-21

申请人： International Business Machines Corporation

发明人： George Andrei Saon , Daniel Bolanos

IPC分类号： G10L15/16 , G06N3/08 , G06N3/04 , G06F17/18

摘要： Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.

5.

发明授权
Accuracy of streaming RNN transducer 有权

公开(公告)号：US11783811B2

公开(公告)日：2023-10-10

申请号：US17031345

申请日：2020-09-24

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , George Andrei Saon

IPC分类号： G10L15/16 , G06F17/18 , G06N3/08 , G10L25/30 , G06F18/10 , G06N3/045

CPC分类号： G10L15/16 , G06F17/18 , G06F18/10 , G06N3/045 , G06N3/08 , G10L25/30

摘要： A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

6.

发明公开
Reducing Exposure Bias in Machine Learning Training of Sequence-to-Sequence Transducers 审中-公开

公开(公告)号：US20230186903A1

公开(公告)日：2023-06-15

申请号：US17549006

申请日：2021-12-13

申请人： International Business Machines Corporation

发明人： Xiaodong Cui , Brian E. D. Kingsbury , George Andrei Saon , David Haws , Zoltan Tueske

IPC分类号： G10L15/16 , G06N5/04 , G06N3/04

CPC分类号： G10L15/16 , G06N5/04 , G06N3/0454

摘要： Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

7.

发明申请
INTEGRATING DIALOG HISTORY INTO END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS 有权

公开(公告)号：US20230056680A1

公开(公告)日：2023-02-23

申请号：US17405532

申请日：2021-08-18

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Jatin Ganhotra , Hong-Kwang Kuo , Sachindra Joshi , George Andrei Saon , Zoltan Tueske , Brian E. D. Kingsbury

IPC分类号： G10L15/16 , G10L15/18 , G10L15/183 , G10L15/065

摘要： Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

8.

发明申请
SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION 有权

公开(公告)号：US20210065680A1

公开(公告)日：2021-03-04

申请号：US16551915

申请日：2019-08-27

申请人： International Business Machines Corporation

发明人： Kartik Audhkhasi , George Andrei Saon , Zoltan Tueske , Brian E. D. Kingsbury , Michael Alan Picheny

IPC分类号： G10L15/06 , G10L15/16 , G10L15/05

摘要： In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

9.

发明授权
Chunking and overlap decoding strategy for streaming RNN transducers for speech recognition 有权

公开(公告)号：US11942078B2

公开(公告)日：2024-03-26

申请号：US17186167

申请日：2021-02-26

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： George Andrei Saon

IPC分类号： G10L15/20 , G06N3/044 , G06N3/045 , G06N3/08 , G10L15/16 , G10L15/26 , G10L15/28

CPC分类号： G10L15/16 , G06N3/044 , G06N3/045 , G06N3/08 , G10L15/20 , G10L15/26 , G10L15/28

摘要： A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.

10.

发明授权
Integrating text inputs for training and adapting neural network transducer ASR models 有权

公开(公告)号：US11908454B2

公开(公告)日：2024-02-20

申请号：US17539752

申请日：2021-12-01

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , Brian E. D. Kingsbury , George Andrei Saon , Gakuto Kurata

IPC分类号： G10L15/06 , G06N3/08 , G10L21/10

CPC分类号： G10L15/063 , G06N3/08 , G10L21/10

摘要： A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类