Patent search cpc:"G10L2015/0635" Page 1

1.

发明申请
SPEECH RECOGNITION 审中-公开
Title translation: 语音识别

公开(公告)号：WO2017151415A1

公开(公告)日：2017-09-08

申请号：PCT/US2017/019264

申请日：2017-02-24

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： FROELICH, Raymond J.

IPC: G10L15/22 , G10L15/08 , G10L15/19 , G10L15/183

CPC classification number: G10L15/22 , G10L15/02 , G10L15/04 , G10L15/063 , G10L15/183 , G10L15/19 , G10L15/222 , G10L25/87 , G10L2015/0635 , G10L2015/088 , G10L2015/221 , G10L2015/225

Abstract: A computer system comprises an input configured to receive voice input from a user, the voice input having speech intervals separated by non-speech intervals; an ASR system configured to identify individual words in the voice input during speech intervals thereof, and store the identified words in memory; a response generation module configured to generate based on the words stored in the memory an audio response for outputting to the user; and a response delivery module configured to begin outputting the audio response to the user during a non-speech interval of the voice input, wherein the outputting of the audio response is terminated before it has completed in response to a subsequent speech interval of the voice input commencing whilst the audio response is still being outputted.

Abstract translation: 计算机系统包括被配置为接收来自用户的语音输入的输入，该语音输入具有由非语音间隔分开的语音区间; ASR系统，被配置为在其语音间隔期间识别语音输入中的单个单词，并将识别的单词存储在存储器中; 响应生成模块，被配置为基于存储在所述存储器中的单词生成用于输出给所述用户的音频响应; 以及响应传递模块，其被配置为在所述语音输入的非语音区间期间开始向所述用户输出所述音频响应，其中响应于所述语音输入的后续语音区间，所述音频响应的输出在其完成之前终止在音频响应仍在输出的同时开始。

2.

发明申请
APPARATUS, METHOD, NON-TRANSITORY COMPUTER-READABLE MEDIUM AND SYSTEM 审中-公开
Title translation: 装置，方法，非中继计算机可读介质和系统

公开(公告)号：WO2016009646A1

公开(公告)日：2016-01-21

申请号：PCT/JP2015/003573

申请日：2015-07-15

Applicant: SONY CORPORATION

Inventor： OGAWA, Hiroaki

IPC: G10L15/22

CPC classification number: G10L15/30 , G10L15/063 , G10L15/08 , G10L15/22 , G10L25/51 , G10L2015/0635 , G10L2015/088

Abstract: There is provided an apparatus including a communication unit configured to transmit information permitting a second apparatus to modify stored voice recognition information based on a relationship between the first apparatus and the second apparatus.

Abstract translation: 提供了一种装置，包括：通信单元，被配置为基于第一装置和第二装置之间的关系，发送允许第二装置修改存储的语音识别信息的信息。

3.

发明申请
METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES 审中-公开
Title translation: 用于校正可转换音频文件的方法和系统

公开(公告)号：WO2007121441A2

公开(公告)日：2007-10-25

申请号：PCT/US2007066791

申请日：2007-04-17

Applicant: VOVISION LLC , HAGER PAUL M

Inventor： HAGER PAUL M

IPC: G10L21/02

CPC classification number: G06F17/273 , G10L15/01 , G10L15/063 , G10L15/18 , G10L15/183 , G10L15/26 , G10L2015/0635

Abstract: Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.

Abstract translation: 纠正转录文本的方法和系统一种方法包括从一个或多个音频数据源接收音频数据，并且基于语音模型转录音频数据以生成文本数据。该方法还包括通过至少一个计算机网络使文本数据可用于多个用户，并且从多个用户通过至少一个计算机网络接收经校正的文本数据。此外，该方法可以包括基于校正的文本数据修改语音模型。

4.

发明申请
音声認識装置及び音声認識方法审中-公开
Title translation: 语音识别装置和语音识别方法

公开(公告)号：WO2005098820A1

公开(公告)日：2005-10-20

申请号：PCT/JP2005/005052

申请日：2005-03-15

Applicant: パイオニア株式会社 , 小林載 , 外山聡一 , 鈴木康悟

Inventor： 小林載 , 外山聡一 , 鈴木康悟

IPC: G10L15/06

CPC classification number: G10L15/065 , G10L15/20 , G10L2015/0635

Abstract: 音声認識処理における雑音適応処理の機能を向上させ、かつ使用メモリの量を低減させた音声認識装置及び音声認識方法を提供する。予め、音響モデルをクラスタリング処理して各クラスタの重心とその重心と各モデルとの差分ベクトルを算出し、想定される各種雑音モデルと算出した重心とのモデル合成を行って合成された各々の重心と差分ベクトルをメモリに格納する。実際の認識処理において、発話環境推定によって推定された環境に最適な重心を同メモリより抽出して、該抽出された重心に同メモリに記憶された差分ベクトルを用いてモデル復元を行い同モデルにより雑音適応処理を実行する。

Abstract translation: 提供了能够改善语音识别中的噪声处理功能并减少所使用的存储量的语音识别装置和语音识别方法。预先对声学模型进行聚类处理，并且计算每个簇的重心以及重心与每个模型之间的差矢量。估计的噪声模型与计算的重心结合在一起，组合重心和差矢量存储在存储器中。在实际识别中，从记忆中提取通过话语环境估计估计的环境的最佳重心。通过使用存储在存储器中的重心提取重心，执行模型恢复并使用该模型进行噪声处理。

5.

发明申请
VOICE RECOGNITION RESULT MANAGEMENT 审中-公开
Title translation: 管理SPRACHER鉴定结果

公开(公告)号：WO01088903A1

公开(公告)日：2001-11-22

申请号：PCT/EP2001/005466

申请日：2001-05-14

IPC: G10L15/06 , G10L15/26

CPC classification number: G10L15/26 , G10L2015/0635

Abstract: The present invention relates to a device provided with a voice processing system allowing to process digital voice signals, a memory system delivering to said voice processing system the information needed to process digital voice signals, as well as a second memory system in which can be registered the digital voice signal processing result. The present invention also relates to a device wherein the voice processing system is distinctly designed so that, for an input digital voice signal, if a corresponding digital voice signal is found in the first memory system, at least a control signal for operating at least an apparatus system is produced.

Abstract translation: 本发明涉及一种装置，包括一个语音处理器装置，用于处理数字语音信号，一个第一存储装置，通过其所需的数字语音的处理中的声音处理装置的信号的可用信息是可调节的，和一第二存储装置，其中所述数字语音信号的处理的结果可以被存储是。本发明还涉及其中所述语音处理器装置被设置在另外的模式中，如果对应的数字语音信号被在第一存储器发现装置，用于注册gebees数字语音信号的装置，至少一个控制信号，用于控制至少一个设备装置是输出。

6.

发明申请
AUTOMATICALLY RETRAINING A SPEECH RECOGNITION SYSTEM 审中-公开
Title translation: 自动重写语音识别系统

公开(公告)号：WO01063596A2

公开(公告)日：2001-08-30

申请号：PCT/US2001/005713

申请日：2001-02-23

IPC: G10L15/10 , G10L15/00 , G10L15/06 , G10L

CPC classification number: G10L15/063 , G10L2015/0635

Abstract: A telephone-based interactive speech recognition system is retrained using variable weighting and incremental retraining. Variable weighting involves changing the relative influence of particular measurement data to be reflected in a statistical model. Statistical model data is determined based upon an initial set of measurement data determined from an initial set of speech utterances. When new statistical model data is to be generated to reflect new measurement data determined from new speech utterances, a weighting factor is applied to the new measurement data to generate weighted new measurement data. The new statistical model data is then determined based upon the initial set of measurement data and the weighted new measurement data. Incremental retraining involves generating new statistical model data using prior statistical model data to reduce the amount of prior measurement data that must be maintained and processed. When prior statistical model data needs to be updated to reflect characteristics and attributes of new speech utterances, statistical model data is generated for the new speech utterances. Then the prior statistical model data and the statistical model data for the new measurement data are processed to generate the new statistical model data.

Abstract translation: 基于电话的交互式语音识别系统使用可变加权和增量再培训进行再培训。可变权重涉及改变特定测量数据的相对影响，反映在统计模型中。基于从初始语音话语集确定的初始测量数据集来确定统计模型数据。当要生成新的统计模型数据以反映从新的语音话语确定的新的测量数据时，将加权因子应用于新的测量数据以产生加权的新的测量数据。然后基于测量数据的初始集合和加权的新测量数据来确定新的统计模型数据。增量再培训涉及使用先前的统计模型数据生成新的统计模型数据，以减少必须维护和处理的先前测量数据的数量。当先前的统计模型数据需要更新以反映新的语音话语的特征和属性时，会为新的语音语音产生统计模型数据。然后对先前的统计模型数据和新测量数据的统计模型数据进行处理，生成新的统计模型数据。

7.

发明申请
一种语言模型的训练方法及装置、设备审中-公开

公开(公告)号：WO2017071226A1

公开(公告)日：2017-05-04

申请号：PCT/CN2016/084959

申请日：2016-06-06

Applicant: 乐视控股(北京)有限公司 , 乐视致新电子科技(天津)有限公司

Inventor： 闫志勇

IPC: G10L15/183 , G10L15/06

CPC classification number: G10L15/063 , G10L15/06 , G10L15/183 , G10L15/197 , G10L2015/0633 , G10L2015/0635

Abstract: 一种语言模型的训练方法及装置、设备，所述方法包括：采用离线训练方式获取通用语言模型，并对该通用语言模型进行裁剪，获得裁剪后的语言模型（101）；采用在线训练方式获取预设时间段内日志的日志语言模型（102）；将所述裁剪后的语言模型和所述日志语言模型进行融合，获得用于进行第一遍解码的第一融合语言模型（103）；将所述通用语言模型和所述日志语言模型进行融合，获得用于进行第二遍解码的第二融合语言模型（104）。上述方法解决现有技术离线获取的语言模型对新的语料覆盖不好，导致语言识别率降低的问题。

8.

发明申请
METHOD AND APPARATUS FOR INITIATING AN OPERATION USING VOICE DATA 审中-公开
Title translation: 使用语音数据启动操作的方法和设备

公开(公告)号：WO2017066424A1

公开(公告)日：2017-04-20

申请号：PCT/US2016/056804

申请日：2016-10-13

Applicant: ALIBABA GROUP HOLDING LIMITED

Inventor： XU, Minqiang , YAN, Zhijie , GAO, Jie , CHU, Min

IPC: G10L17/26

CPC classification number: G10L15/22 , G10L15/02 , G10L15/04 , G10L15/063 , G10L17/04 , G10L17/08 , G10L25/78 , G10L2015/0635 , G10L2015/223 , G10L2015/228

Abstract: A method for initiating an operation using voice is provided. The method includes extracting one or more voice features based on first audio data detected in a use stage; determining a similarity between the first audio data and a preset first voice model according to the one or more voice features, wherein the first voice model is associated with second audio data of a user, and the second audio data is associated with one or more preselected voice contents; and executing an operation corresponding to the first voice model based on the similarity.

Abstract translation: 提供了一种使用语音来启动操作的方法。该方法包括基于在使用阶段中检测到的第一音频数据来提取一个或多个语音特征; 根据所述一个或多个语音特征确定所述第一音频数据与预设的第一语音模型之间的相似度，其中，所述第一语音模型与用户的第二音频数据相关联，并且所述第二音频数据与一个或多个预先选择的声音内容; 并且基于相似性执行与第一语音模型相对应的操作。

9.

发明申请
DISCRIMINATIVE DATA SELECTION FOR LANGUAGE MODELING 审中-公开
Title translation: 语言建模的辨别数据选择

公开(公告)号：WO2016183110A1

公开(公告)日：2016-11-17

申请号：PCT/US2016/031690

申请日：2016-05-11

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： LEVIT, Michael , CHANG, Shuangyu , DUMOULIN, Benoit

IPC: G10L15/19

CPC classification number: G10L15/063 , G10L15/10 , G10L15/14 , G10L15/18 , G10L15/19 , G10L2015/0633 , G10L2015/0635

Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.

Abstract translation: 用于语言建模的计算机系统可以从一个或多个信息源收集训练数据，生成包含转录语言文本的口语语料库，并生成包含打字文本的类型语料库。计算机系统可以从口语语料库导出特征向量，分析类型语料库以确定表示类型文本项目的特征向量，并且通过对打字语料库进行过滤来生成不可描述的语料库，以去除由特征向量表示的每个类型文本项目，在从口语语料库导出的特征向量的相似阈值内。计算机系统可以从不可描述的语料库导出特征向量，并且训练分类器，以基于从口语语料库导出的特征向量和从不可描述的语料库导出的特征向量来执行用于语言建模的区别性数据选择。

10.

发明申请
リカレント型ニューラルネットワークの学習方法及びそのためのコンピュータプログラム、並びに音声認識装置审中-公开
Title translation: 现有神经网络学习方法，计算机程序及语音识别装置

公开(公告)号：WO2016181951A1

公开(公告)日：2016-11-17

申请号：PCT/JP2016/063817

申请日：2016-05-10

Applicant: 国立研究開発法人情報通信研究機構

Inventor： 神田　直之

IPC: G06N3/08 , G06N3/04 , G10L15/06 , G10L15/16

CPC classification number: G06N3/08 , G06N3/04 , G10L15/06 , G10L15/063 , G10L15/16 , G10L2015/0635

Abstract: 【課題】時系列のデータによるリカレント型ニューラルネットワーク（ＲＮＮ）の学習を効率化する学習方法を提供する。【解決手段】学習方法は、ＲＮＮを初期化するステップ２２０と、あるベクトルを開始位置として指定し、各パラメータを誤差関数が最小化するよう最適化することでＲＮＮの学習を行う学習ステップ２２６とを含む。学習ステップ２２６は、指定されたベクトルを先頭とする連続するＮ個（Ｎ≧３）のベクトルを用い、末尾のベクトルの参照値を正解ラベルとするＴｒｕｎｃａｔｅｄＢＰＴＴによりＲＮＮのパラメータを更新する更新ステップ２５０と、終了条件が成立するまで、更新ステップで使用されたＮ個のベクトルの末尾のベクトルに対して所定の関係を満たす位置にあるベクトルを新たに指定して、学習ステップを実行する処理を繰返す第１の繰返しステップ２４０とを含む。所定の関係を満たす位置にあるベクトルは指定されたベクトルより少なくとも２個以上後のベクトルである。

Abstract translation: [问题]提供使用时间序列数据提高循环神经网络（RNN）的学习效率的学习方法。 [解决方案]学习方法包括用于初始化RNN的步骤220，以及用于指示向量作为起始位置并优化参数使得误差系数最小化的学习步骤226，由此执行RNN学习。学习步骤226包括：更新步骤250，用N（N≥3）个连续向量作为头部，用截断的BPTT更新RNN的参数，尾部矢量的参考值是正确的答案标签 ; 以及用于重复直到结束条件为真的处理的第一重复步骤240，对于在更新步骤中使用的N个向量的尾部向量，新指定满足规定关系的位置中的向量，并且学习步骤执行。满足规定关系的位置中的向量是所指示的向量之后的两个或更多个向量。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification