专利检索 ap:("International Business Machines Corporation" OR "The Ohio State University") AND inv:"Brian E. D. Kingsbury" 第 1 页

1.

发明授权
End-to-end integration of dialog history for spoken language understanding 有权

公开(公告)号：US12119008B2

公开(公告)日：2024-10-15

申请号：US17655441

申请日：2022-03-18

申请人： International Business Machines Corporation , The Ohio State University

发明人： Samuel Thomas , Vishal Sunder , Hong-Kwang Kuo , Jatin Ganhotra , Brian E. D. Kingsbury , Eric Fosler-Lussier

IPC分类号： G10L19/00 , G06F40/126 , G06N3/045 , G10L15/00

CPC分类号： G10L19/00 , G06F40/126 , G06N3/045 , G10L15/00

摘要： Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.

2.

发明公开
END-TO-END INTEGRATION OF DIALOG HISTORY FOR SPOKEN LANGUAGE UNDERSTANDING 审中-公开

公开(公告)号：US20230298596A1

公开(公告)日：2023-09-21

申请号：US17655441

申请日：2022-03-18

申请人： International Business Machines Corporation , The Ohio State University

发明人： Samuel Thomas , Vishal Sunder , Hong-Kwang Kuo , Jatin Ganhotra , Brian E. D. Kingsbury , Eric Fosler-Lussier

IPC分类号： G10L19/00 , G10L15/00 , G06F40/126 , G06N3/04

CPC分类号： G10L19/00 , G10L15/00 , G06F40/126 , G06N3/0454

摘要： Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.

3.

发明授权
Customization of recurrent neural network transducers for speech recognition 有权

公开(公告)号：US11908458B2

公开(公告)日：2024-02-20

申请号：US17136439

申请日：2020-12-29

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , George Andrei Saon , Brian E. D. Kingsbury

IPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

CPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

摘要： A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

4.

发明申请
TRANSLITERATION BASED DATA AUGMENTATION FOR TRAINING MULTILINGUAL ASR ACOUSTIC MODELS IN LOW RESOURCE SETTINGS 有权

公开(公告)号：US20220122585A1

公开(公告)日：2022-04-21

申请号：US17073337

申请日：2020-10-17

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Samuel Thomas , Kartik Audhkhasi , Brian E. D. Kingsbury

IPC分类号： G10L15/06 , G10L15/16

摘要： A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

5.

发明授权
Systems and methods for accelerating hessian-free optimization for deep neural networks by implicit preconditioning and sampling 有权

公开(公告)号：US10056075B2

公开(公告)日：2018-08-21

申请号：US15373775

申请日：2016-12-09

申请人： International Business Machines Corporation

发明人： Lior Horesh , Brian E. D. Kingsbury , Tara N. Sainath

IPC分类号： G10L15/06 , G10L15/16 , G10L15/02

CPC分类号： G10L15/063 , G06N3/084 , G10L15/02 , G10L15/16

摘要： A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.

6.

发明授权
Data augmentation method based on stochastic feature mapping for automatic speech recognition 有权

公开(公告)号：US09721559B2

公开(公告)日：2017-08-01

申请号：US14689730

申请日：2015-04-17

申请人： International Business Machines Corporation

发明人： Xiaodong Cui , Vaibhava Goel , Brian E. D. Kingsbury

IPC分类号： G10L15/16 , G10L15/06 , G10L15/02

CPC分类号： G10L15/063 , G10L15/02 , G10L15/16 , G10L21/0272

摘要： A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

7.

发明申请
DATA AUGMENTATION METHOD BASED ON STOCHASTIC FEATURE MAPPING FOR AUTOMATIC SPEECH RECOGNITION 有权
标题翻译：基于自动语音识别的定位特征映射的数据补偿方法

公开(公告)号：US20170040016A1

公开(公告)日：2017-02-09

申请号：US14689730

申请日：2015-04-17

申请人： International Business Machines Corporation

发明人： Xiaodong Cui , Vaibhava Goel , Brian E. D. Kingsbury

IPC分类号： G10L15/06 , G10L15/02 , G10L15/16

CPC分类号： G10L15/063 , G10L15/02 , G10L15/16 , G10L21/0272

摘要： A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

摘要翻译： 一种增强训练数据的方法包括：将来自誊本内的多个话语确定的源扬声器的特征序列转换成在相同抄本下的目标说话者的特征序列，训练用于目标说话者的与扬声器相关的声学模型，以对应于讲话者专有的声学特性，估计源扬声器的特征序列与目标扬声器的与扬声器相关的声学模型之间的映射函数，以及使用映射函数将来自训练集中的每个说话者的每个发声器映射到多个选定的目标扬声器在训练集中。

8.

发明授权
Method and system for efficient spoken term detection using confusion networks 有权
标题翻译：使用混淆网络进行有效口语检测的方法和系统

公开(公告)号：US09196243B2

公开(公告)日：2015-11-24

申请号：US14230790

申请日：2014-03-31

申请人： International Business Machines Corporation

发明人： Brian E. D. Kingsbury , Hong-Kwang Kuo , Lidia Mangu , Hagen Soltau

IPC分类号： G10L15/05 , G10L15/08

CPC分类号： G10L15/083 , G10L13/08 , G10L15/02 , G10L2015/025 , G10L2015/085

摘要： Systems and methods for spoken term detection are provided. A method for spoken term detection, comprises receiving phone level out-of-vocabulary (OOV) keyword queries, converting the phone level OOV keyword queries to words, generating a confusion network (CN) based keyword searching (KWS) index, and using the CN based KWS index for both in-vocabulary (IV) keyword queries and the OOV keyword queries.

摘要翻译： 提供了用于词汇检测的系统和方法。一种用于口语术语检测的方法，包括接收电话级词汇（OOV）关键字查询，将电话级OOV关键字查询转换为单词，生成基于混合网络（CN）的关键词搜索（KWS）索引，并使用基于CN的KWS索引用于词汇（IV）关键词查询和OOV关键字查询。

9.

发明授权
Input encoding for classifier generalization 有权

公开(公告)号：US11914678B2

公开(公告)日：2024-02-27

申请号：US17030156

申请日：2020-09-23

申请人： International Business Machines Corporation

发明人： Hazar Yueksel , Kush Raj Varshney , Brian E. D. Kingsbury

IPC分类号： G06F18/241 , G06N3/08 , H03M7/30

CPC分类号： G06F18/241 , G06N3/08 , H03M7/3062 , H03M7/3066 , H03M7/6011

摘要： Techniques for classifier generalization in a supervised learning process using input encoding are provided. In one aspect, a method for classification generalization includes: encoding original input features from at least one input sample {right arrow over (x)}S with a uniquely decodable code using an encoder E(⋅) to produce encoded input features E({right arrow over (x)}S), wherein the at least one input sample {right arrow over (x)}S comprises uncoded input features; feeding the uncoded input features and the encoded input features E({right arrow over (x)}S) to a base model to build an encoded model; and learning a classification function {tilde over (C)}E(⋅) using the encoded model, wherein the classification function {tilde over (C)}E(⋅) learned using the encoded model is more general than that learned using the uncoded input features alone.

10.

发明授权
Integrating text inputs for training and adapting neural network transducer ASR models 有权

公开(公告)号：US11908454B2

公开(公告)日：2024-02-20

申请号：US17539752

申请日：2021-12-01

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , Brian E. D. Kingsbury , George Andrei Saon , Gakuto Kurata

IPC分类号： G10L15/06 , G06N3/08 , G10L21/10

CPC分类号： G10L15/063 , G06N3/08 , G10L21/10

摘要： A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类