专利检索 ap:("International Business Machines Corporation") AND inv:"Gakuto Kurata" 第 6 页

51.

发明申请
SPEECH RETRIEVAL METHOD, SPEECH RETRIEVAL APPARATUS, AND PROGRAM FOR SPEECH RETRIEVAL APPARATUS 有权

公开(公告)号：US20150302848A1

公开(公告)日：2015-10-22

申请号：US14692105

申请日：2015-04-21

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , Tohru Nagano , Masafumi Nishimura

IPC分类号： G10L15/02

CPC分类号： G10L15/02 , G10L15/04 , G10L15/08 , G10L15/187 , G10L25/51 , G10L2015/025 , G10L2015/027 , G10L2015/088

摘要： A method for speech retrieval includes acquiring a keyword designated by a character string, and a phoneme string or a syllable string, detecting one or more coinciding segments by comparing a character string that is a recognition result of word speech recognition with words as recognition units performed for speech data to be retrieved and the character string of the keyword, calculating an evaluation value of each of the one or more segments by using the phoneme string or the syllable string of the keyword to evaluate a phoneme string or a syllable string that is recognized in each of the detected one or more segments and that is a recognition result of phoneme speech recognition with phonemes or syllables as recognition units performed for the speech data, and outputting a segment in which the calculated evaluation value exceeds a predetermined threshold.

52.

发明授权
Information processing device, large vocabulary continuous speech recognition method and program including hypothesis ranking 有权
标题翻译：信息处理设备，大词汇连续语音识别方法和程序包括假设排名

公开(公告)号：US09165553B2

公开(公告)日：2015-10-20

申请号：US13744963

申请日：2013-01-18

申请人： International Business Machines Corporation

发明人： Gakuto Kurata , Masayuki Suzuki , Masafumi Nishimura

IPC分类号： G10L15/04 , G10L15/10 , G10L15/02

CPC分类号： G10L15/04 , G10L15/10 , G10L2015/025

摘要： System and method for performing speech recognition using acoustic invariant structure for large vocabulary continuous speech. An information processing device receives sound as input and performs speech recognition. The information processing device includes: a speech recognition processing unit for outputting a speech recognition score, a structure score calculation unit for calculation of a structure score that is a score that, with respect for each hypothesis concerning all phoneme pairs comprising the hypothesis, is found by applying phoneme pair-by-pair weighting to phoneme pair inter-distribution distance likelihood and then performing summation, and a ranking unit for ranking the multiple hypotheses based on a sum value of speech recognition score and structure score.

摘要翻译： 用于大词汇连续语音的声学不变结构执行语音识别的系统和方法。信息处理装置接收声音作为输入并执行语音识别。信息处理装置包括：语音识别处理单元，用于输出语音识别分数;结构分数计算单元，用于计算结构分数，该结构分数是关于涉及包含该假设的所有音素对的每个假设的分数通过对音素对对分配距离可能性应用音节对对加权，然后执行求和，以及基于语音识别得分和结构得分的和值对多个假设进行排名的排序单元。

53.

发明申请
CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION 有权

公开(公告)号：US20150051899A1

公开(公告)日：2015-02-19

申请号：US13965492

申请日：2013-08-13

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Nathan M. Bodenstab , Nobuyasu Itoh , Gakuto Kurata , Masafumi Nishimura , Paul J. Vozila

IPC分类号： G06F17/27

CPC分类号： G06F17/27

摘要： Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts.

54.

发明授权
Customization of recurrent neural network transducers for speech recognition 有权

公开(公告)号：US11908458B2

公开(公告)日：2024-02-20

申请号：US17136439

申请日：2020-12-29

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , George Andrei Saon , Brian E. D. Kingsbury

IPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

CPC分类号： G10L15/16 , G06N3/08 , G10L13/02 , G10L25/30

摘要： A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

55.

发明公开
VOICE ACTIVITY DETECTION INTEGRATION TO IMPROVE AUTOMATIC SPEECH DETECTION 审中-公开

公开(公告)号：US20240038221A1

公开(公告)日：2024-02-01

申请号：US17815798

申请日：2022-07-28

申请人： International Business Machines Corporation

发明人： Sashi Novitasari , Takashi Fukuda , Gakuto Kurata

IPC分类号： G10L15/16 , G10L25/78 , G10L15/22 , G10L15/06 , G10L15/20

CPC分类号： G10L15/16 , G10L25/78 , G10L15/22 , G10L15/063 , G10L15/20

摘要： Systems, computer-implemented methods, and computer program products to facilitate multi-task training a recurrent neural network transducer (RNN-T) using automatic speech recognition (ASR) information are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can include an RNN-T that can receive ASR information. The computer executable components can include a voice activity detection (VAD) model that trains the RNN-T using the ASR information, where the RNN-T can further comprise an encoder and a joint network. One or more outputs of the encoder can be integrated with the joint network and one or more outputs of the VAD model.

56.

发明公开
KNOWLEDGE TRANSFER BETWEEN RECURRENT NEURAL NETWORKS 审中-公开

公开(公告)号：US20230196107A1

公开(公告)日：2023-06-22

申请号：US18168794

申请日：2023-02-14

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , Kartik Audhkhasi

IPC分类号： G06N3/08 , G06N3/044 , G06N3/045

CPC分类号： G06N3/08 , G06N3/044 , G06N3/045

摘要： Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

57.

发明授权
Adaptation of model for recognition processing 有权

公开(公告)号：US11443169B2

公开(公告)日：2022-09-13

申请号：US15048318

申请日：2016-02-19

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata

IPC分类号： G06N3/04

摘要： A computer implemented method for adapting a model for recognition processing to a target-domain is disclosed. The method includes preparing a first distribution in relation to a part of the model, in which the first distribution is derived from data of a training-domain for the model. The method also includes obtaining a second distribution in relation to the part of the model by using data of the target-domain. The method further includes tuning one or more parameters of the part of the model so that difference between the first and the second distributions becomes small.

58.

发明申请
MULTI-STEP LINEAR INTERPOLATION OF LANGUAGE MODELS 有权

公开(公告)号：US20220254335A1

公开(公告)日：2022-08-11

申请号：US17168982

申请日：2021-02-05

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Nobuyasu Itoh , Masayuki Suzuki , Gakuto Kurata

IPC分类号： G10L15/187 , G10L15/197 , G10L15/06 , G10L15/16 , G10L15/26 , G06K9/62

摘要： A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

59.

发明申请
LEARNING UNPAIRED MULTIMODAL FEATURE MATCHING FOR SEMI-SUPERVISED LEARNING 有权

公开(公告)号：US20220172080A1

公开(公告)日：2022-06-02

申请号：US17109550

申请日：2020-12-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Subhajit Chaudhury , Daiki Kimura , Gakuto Kurata , Ryuki Tachibana

IPC分类号： G06N5/04 , G06N20/00

摘要： A computer-implemented method is provided for learning multimodal feature matching. The method includes training an image encoder to obtain encoded images. The method further includes training a common classifier on the encoded images by using labeled images. The method also includes training a text encoder while keeping the common classifier in a fixed configuration by using learned text embeddings and corresponding labels for the learned text embeddings. The text encoder is further trained to match a distance of predicted text embeddings which is encoded by the text encoder to a fitted Gaussian distribution on the encoded images.

60.

发明授权
Aligning spike timing of models for maching learning 有权

公开(公告)号：US11302309B2

公开(公告)日：2022-04-12

申请号：US16570022

申请日：2019-09-13

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Gakuto Kurata , Kartik Audhkhasi

IPC分类号： G10L15/06 , G10L15/26 , G10L15/34 , G06N3/04 , G06N3/08 , G10L15/16

摘要： A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类