专利检索 ap:("International Business Machines Corporation") AND inv:"Samuel Thomas" 第 6 页

51.

发明授权
Multilingual intent recognition 有权

公开(公告)号：US11900922B2

公开(公告)日：2024-02-13

申请号：US17093673

申请日：2020-11-10

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , Kartik Audhkhasi , Michael Alan Picheny

IPC分类号： G10L15/16 , G10L15/08 , G06F40/295 , G06N3/04 , G06F18/214

CPC分类号： G10L15/16 , G06F18/2148 , G06N3/04 , G06F40/295 , G10L2015/088

摘要： Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

52.

发明授权
Training of student neural network with switched teacher neural networks 有权

公开(公告)号：US11610108B2

公开(公告)日：2023-03-21

申请号：US16047287

申请日：2018-07-27

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Takashi Fukuda , Masayuki Suzuki , Osamu Ichikawa , Gakuto Kurata , Samuel Thomas , Bhuvana Ramabhadran

IPC分类号： G06N3/08 , G06N3/04

摘要： A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.

53.

发明申请
END TO END SPOKEN LANGUAGE UNDERSTANDING MODEL 有权

公开(公告)号：US20220319494A1

公开(公告)日：2022-10-06

申请号：US17218618

申请日：2021-03-31

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , George Andrei Saon , Zoltan Tueske , Brian E. D. Kingsbury

IPC分类号： G10L15/06 , G06K9/62 , G10L13/02

摘要： An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.

54.

发明申请
MULTILINGUAL INTENT RECOGNITION 有权

公开(公告)号：US20220148581A1

公开(公告)日：2022-05-12

申请号：US17093673

申请日：2020-11-10

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Hong-Kwang Kuo , Kartik Audhkhasi , Michael Alan Picheny

IPC分类号： G10L15/16 , G06N3/04 , G06K9/62

摘要： Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

55.

发明授权
Using closed captions as parallel training data for customization of closed captioning systems 有权

公开(公告)号：US11250872B2

公开(公告)日：2022-02-15

申请号：US16714719

申请日：2019-12-14

申请人： International Business Machines Corporation

发明人： Samuel Thomas , Yinghui Huang , Masayuki Suzuki , Zoltan Tueske , Laurence P. Sansone , Michael A. Picheny

IPC分类号： G10L15/26 , G10L21/10 , G10L15/22 , G10L15/08

摘要： Method, apparatus, and computer program product are provided for customizing an automatic closed captioning system. In some embodiments, at a data use (DU) location, an automatic closed captioning system that includes a base model is provided, search criteria are defined to request from one or more data collection (DC) locations, a search request based on the search criteria is sent to the one or more DC locations, relevant closed caption data from the one or more DC locations are received responsive to the search request, the received relevant closed caption data are processed by computing a confidence score for each of a plurality of data sub-sets of the received relevant closed caption data and selecting one or more of the data sub-sets based on the confidence scores, and the automatic closed captioning system is customized by using the selected one or more data sub-sets to train the base model.

56.

发明申请
LEVERAGING UNPAIRED TEXT DATA FOR TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS 有权

公开(公告)号：US20210312906A1

公开(公告)日：2021-10-07

申请号：US16841787

申请日：2020-04-07

申请人： International Business Machines Corporation

发明人： Hong-Kwang Jeff Kuo , Yinghui Huang , Samuel Thomas , Kartik Audhkhasi , Michael Alan Picheny

IPC分类号： G10L15/06 , G10L15/00 , G10L15/16 , G10L15/26

摘要： An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

57.

发明授权
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier 有权

公开(公告)号：US10546575B2

公开(公告)日：2020-01-28

申请号：US15379038

申请日：2016-12-14

申请人： International Business Machines Corporation

发明人： Dimitrios B. Dimitriadis , David C. Haws , Michael Picheny , George Saon , Samuel Thomas

IPC分类号： G10L21/00 , G10L15/04

摘要： Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

58.

发明申请
CONSTRUCTING, EVALUATING, AND IMPROVING A SEARCH STRING FOR RETRIEVING IMAGES INDICATING ITEM USE 审中-公开

公开(公告)号：US20190205431A1

公开(公告)日：2019-07-04

申请号：US15856505

申请日：2017-12-28

申请人： International Business Machines Corporation

发明人： Anne E. Gattiker , Sujatha Kashyap , Minh Ngoc Binh Nguyen , Samuel Thomas , Kaipeng Li , Thomas Hubregtsen

IPC分类号： G06F17/30

CPC分类号： G06F16/583 , G06F16/2425 , G06F16/24578 , G06F16/9535

摘要： Examples of techniques for constructing, evaluating, and improving a search string for retrieving images are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes receiving, by a processing device, a plurality of images as search results returned based at least in part on a search string for an item in the form of a tuple including an item class, an action and an actor. The method further includes determining, by the processing device, whether the search string is effective at indicating a common item use based on image similarity. The method further includes, based at least in part on determining that the search string is ineffective at indicating the item use, generating, by the processing device, an alternative search string.

59.

发明授权
Using long short-term memory recurrent neural network for speaker diarization segmentation 有权

公开(公告)号：US10249292B2

公开(公告)日：2019-04-02

申请号：US15379010

申请日：2016-12-14

申请人： International Business Machines Corporation

发明人： Dimitrios B. Dimitriadis , David C. Haws , Michael Picheny , George Saon , Samuel Thomas

IPC分类号： G10L21/00 , G10L15/00 , G10L15/04 , G10L25/30 , G10L25/78 , G10L17/18

摘要： Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

60.

发明申请
USING RECURRENT NEURAL NETWORK FOR PARTITIONING OF AUDIO DATA INTO SEGMENTS THAT EACH CORRESPOND TO A SPEECH FEATURE CLUSTER IDENTIFIER 审中-公开

公开(公告)号：US20180166067A1

公开(公告)日：2018-06-14

申请号：US15379038

申请日：2016-12-14

申请人： International Business Machines Corporation

发明人： Dimitrios B. Dimitriadis , David C. Haws , Michael Picheny , George Saon , Samuel Thomas

IPC分类号： G10L15/04 , G10L15/02 , G10L15/06 , G10L15/16 , G10L25/81

CPC分类号： G10L15/04 , G10L17/18 , G10L25/30 , G10L25/78

摘要： Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类