专利检索 ap:("International Business Machines Corporation") AND inv:"Samuel Thomas" 第 1 页

1.

发明授权
Integrating text inputs for training and adapting neural network transducer ASR models 有权

公开(公告)号：US11908454B2

公开(公告)日：2024-02-20

申请号：US17539752

申请日：2021-12-01

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , Brian E. D. Kingsbury , George Andrei Saon , Gakuto Kurata

IPC分类号： G10L15/06 , G06N3/08 , G10L21/10

CPC分类号： G10L15/063 , G06N3/08 , G10L21/10

摘要： A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

2.

发明申请
TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH UNORDERED ENTITIES 有权

公开(公告)号：US20230081306A1

公开(公告)日：2023-03-16

申请号：US17458772

申请日：2021-08-27

申请人： International Business Machines Corporation

发明人： Hong-Kwang Kuo , Zoltan Tueske , Samuel Thomas , Brian E. D. Kingsbury , George Andrei Saon

IPC分类号： G10L15/22 , G10L15/16 , G06N3/08

摘要： Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

3.

发明授权
Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings 有权

公开(公告)号：US11568858B2

公开(公告)日：2023-01-31

申请号：US17073337

申请日：2020-10-17

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Samuel Thomas , Kartik Audhkhasi , Brian E. D. Kingsbury

IPC分类号： G10L15/06 , G10L15/16

摘要： A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

4.

发明申请
MULTI-MODAL LUNG CAPACITY MEASUREMENT FOR RESPIRATORY ILLNESS PREDICTION 有权

公开(公告)号：US20220110542A1

公开(公告)日：2022-04-14

申请号：US17065936

申请日：2020-10-08

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Nalini K. Ratha , Jonathan Hudson Connell, II

IPC分类号： A61B5/091 , A61B5/00 , G06N3/08

摘要： Determining lung capacity of includes capturing an audio waveform of the user performing an utterance presented to a user. A video of the user performing the utterance can be captured. The captured audio waveform and the video are analyzed for compliance. Based on the audio waveform, an indicator of respiratory function is determined. The indicator is compared with a reference indicator to determine health of the user. A machine learning model such as neural network can be trained to predict the indicator of the respiratory function based on input features comprising audio spectral and temporal characteristics of utterances. Determining the indicator or respiratory function can include running the trained machine learning model.

5.

发明申请
End-to-End Spoken Language Understanding Without Full Transcripts 有权

公开(公告)号：US20220084508A1

公开(公告)日：2022-03-17

申请号：US17021956

申请日：2020-09-15

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Hong-Kwang Jeff Kuo , Zoltan Tueske , Samuel Thomas , Yinghui Huang , Brian E. D. Kingsbury , Kartik Audhkhasi

IPC分类号： G10L15/16 , G10L15/18 , G10L15/02 , G06N3/08 , H04L29/06 , H04L29/08

摘要： A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

6.

发明申请
TRAINING OF STUDENT NEURAL NETWORK WITH SWITCHED TEACHER NEURAL NETWORKS 审中-公开

公开(公告)号：US20200034702A1

公开(公告)日：2020-01-30

申请号：US16047287

申请日：2018-07-27

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Masayuki Suzuki , Osamu Ichikawa , Gakuto Kurata , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G06N3/08 , G06N3/04

摘要： A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.

7.

发明申请
DENOISING A SIGNAL 审中-公开

公开(公告)号：US20190237090A1

公开(公告)日：2019-08-01

申请号：US16379667

申请日：2019-04-09

申请人： International Business Machines Corporation

发明人： Dimitrios B. Dimitriadis , Samuel Thomas , Colin C. Vaz

IPC分类号： G10L21/0208

CPC分类号： G10L21/0208

摘要： A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.

8.

发明申请
SOFT LABEL GENERATION FOR KNOWLEDGE DISTILLATION 审中-公开

公开(公告)号：US20190205748A1

公开(公告)日：2019-07-04

申请号：US15860097

申请日：2018-01-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G06N3/08

CPC分类号： G06N3/08

摘要： A technique for generating soft labels for training is disclosed. In the method, a teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Each class pair includes classes labelled to a corresponding data unit from among the teacher side class set and from among a student side class set that is different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.

9.

发明授权
Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms 有权

公开(公告)号：US10230922B2

公开(公告)日：2019-03-12

申请号：US15722704

申请日：2017-10-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Stanley Chen , Kenneth W. Church , Vaibhava Goel , Lidia L. Mangu , Etienne Marcheret , Bhuvana Ramabhadran , Laurence P. Sansone , Abhinav Sethy , Samuel Thomas

IPC分类号： H04N7/14 , H04N7/15 , H04W12/06 , G10L25/60 , H04L29/06

摘要： A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

10.

发明申请
DENOISING A SIGNAL 审中-公开

公开(公告)号：US20180047409A1

公开(公告)日：2018-02-15

申请号：US15793884

申请日：2017-10-25

申请人： International Business Machines Corporation

发明人： Dimitrios B. Dimitriadis , Samuel Thomas , Colin C. Vaz

IPC分类号： G10L21/0224 , G10L25/24 , G10L21/0388

CPC分类号： G10L21/0208

摘要： A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类