Patent search cpc:"G10L15/16" Page 1

1.

发明申请
基于循环神经网络语音识别中语音数据增强方法及装置审中-公开

公开(公告)号：WO2019024008A1

公开(公告)日：2019-02-07

申请号：PCT/CN2017/095668

申请日：2017-08-02

Applicant: 中国科学院自动化研究所

Inventor： 赵媛媛 , 徐爽 , 徐波

IPC: G10L15/16 , G10L15/06 , G10L15/02 , G10L15/20

CPC classification number: G10L15/02 , G10L15/06 , G10L15/16 , G10L15/20

Abstract: 语音识别处理领域的一种基于循环神经网络的语音数据增强的方法，旨在解决循环神经网络在语音识别中由于模拟语音识别中不规则语法现象引起的过度建模词间依赖的问题。方法包括：从输入的语音数据中提取标识语音的各个频率能量值的声学特征，生成声学特征向量（201）；根据预设的标注文件和声学特征向量获得语音数据的语句标签序列（202）；通过决策聚类预设的标注文件和语句标签序列获得决策聚类操作后的对齐文件（203）；生成一个[0,1]之间的第一随机数γ，并与预设的调整比例α比较（204）；如果第一随机数γ大于调整比例α，在边界文件所指示的位置对上述语音数据进行增强处理（205）。能够快速、方便地增加训练数据中不规则的口语化现象。

2.

发明申请
语音处理方法及装置、存储介质及处理器审中-公开

公开(公告)号：WO2019019667A1

公开(公告)日：2019-01-31

申请号：PCT/CN2018/079848

申请日：2018-03-21

Applicant: 深圳光启合众科技有限公司 , 深圳光启创新技术有限公司

Inventor： 刘若鹏 , 陈峰

IPC: G10L15/06

CPC classification number: G10L15/06 , G10L15/16 , G10L15/26

Abstract: 一种语音处理方法及装置、存储介质及处理器。其中，方法包括：获取预设时间段内多个时刻的语音向量（S102）；利用预设语音模型对多个时刻的语音向量进行处理，得到与多个时刻的语音向量相对应的多个文本信息（S104），其中，预设语音模型基于预先存储的多个时刻的参数向量对多个时刻的语音向量进行处理；输出多个文本信息（S106）。解决了现有技术中的语音处理方法的处理效率低的技术问题。

3.

发明申请
METHODS AND SYSTEMS FOR LOCATING THE END OF THE KEYWORD IN VOICE SENSING 审中-公开

公开(公告)号：WO2018097969A1

公开(公告)日：2018-05-31

申请号：PCT/US2017/060833

申请日：2017-11-09

Applicant: KNOWLES ELECTRONICS, LLC

Inventor： LAROCHE, Jean , NEMALA, Sridhar , SRINIVASAN, Sundar , GUPTA, Hitesh

IPC: G10L15/04

CPC classification number: G10L15/05 , G10L15/02 , G10L15/04 , G10L15/08 , G10L15/142 , G10L15/16 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

Abstract: Systems and methods for locating the end of a keyword in voice sensing are provided. An example method includes receiving an acoustic signal that includes a keyword portion immediately followed by a query portion. The acoustic signal represents at least one captured sound. The method further includes determining the end of the keyword portion. The method further includes, separating, using the end of the keyword portion, the query portion from the keyword portion of the acoustic signal. The method further includes providing the query portion, absent any part of the keyword portion, to an automatic speech recognition (ASR) system.

4.

发明申请
音響モデルの学習装置及びそのためのコンピュータプログラム审中-公开
Title translation: 学习声学模型的装置及其计算机程序

公开(公告)号：WO2018066436A1

公开(公告)日：2018-04-12

申请号：PCT/JP2017/035018

申请日：2017-09-27

Applicant: 国立研究開発法人情報通信研究機構

Inventor： 神田　直之

IPC: G10L15/06 , G10L15/16

CPC classification number: G10L15/06 , G10L15/065 , G10L15/16

Abstract: 【課題】ニューラルネットワーク(NN)の特性を活かした音響モデルにおいて、音声認識精度を高めることができる音響モデルの学習装置を提供する。【解決手段】学習装置３５０は、学習データ記憶部３６０に記憶された学習データ内の観測系列が与えられたときの、学習データの正解サブワード列の事後確率の、学習データ全体に亘る和が最大となるように、NNによるConnectionist Temporal Classification音響モデル（CTC-AM）３６４を最適化する学習処理部３６２と、評価データ記憶部３７６に記憶された評価用データの観測系列が与えられたときに、CTC-AM３６４と言語モデル３６８、３７０とを用いて推定した単語列の仮説の精度を表す評価値の期待値が最大となるように、CTC-AM３６４をさらに最適化するＭＢＲ学習処理部３６６、精度評価部３７４及び学習・評価制御部３７８を含む。

Abstract translation: 要解决的问题：提供一种能够在充分利用神经网络（NN）的特性的声学模型中增强语音识别准确度的声学模型学习设备。的学习装置350中，当学习数据存储单元存储的学习数据360的观察序列中给出，训练数据的正确单词序列的后验概率，在整个训练数据的最大总和给出用于通过NN对连接主义者时间分类声学模型（CTC-AM）364进行优化的学习处理单元362和存储在评估数据存储单元376中的评估数据的观察序列，作为表示使用CTC-AM364和语言模型368估计出的单词序列假设的准确性的评估值的预期值，370被最大，MBR学习处理单元366以进一步优化CTC-AM364，精密评估单元374以及学习和评估控制单元378。

5.

发明申请
話者適応化装置、音声認識装置および音声認識方法审中-公开
Title translation: 扬声器适配装置，语音识别装置和语音识别方法

公开(公告)号：WO2018029777A1

公开(公告)日：2018-02-15

申请号：PCT/JP2016/073408

申请日：2016-08-09

Applicant: 三菱電機株式会社

Inventor： 太刀岡　勇気

IPC: G10L15/07 , G10L15/16

CPC classification number: G10L15/07 , G10L15/16

Abstract: 適応化部（７）は、誤差算出部（６）に算出された誤差が減少するようにＤＮＮ（５）におけるノード間の接続重みを示す重み行列の重みを、学習話者数（Ｎ）ごとに、または学習話者数（Ｎ）ごとかつ話者適応層（５－３）の出力（ｘｏｕｔ）の次元数（Ｄｏｕｔ）ごとに算出する。

Abstract translation: 自适应单元（7）设置指示DNN（5）中的节点之间的连接权重的权重矩阵的权重，使得由误差计算单元（6）计算的误差减小，对于每个学习讲话者（N）或者对于学习讲话者的数量（N），说话人适应层（5-3）的输出（x out>）的维度数目（D sub） out ）。

6.

发明申请
GENERATING TARGET SEQUENCES FROM INPUT SEQUENCES USING PARTIAL CONDITIONING 审中-公开
Title translation: 使用局部调节从输入序列生成目标序列

公开(公告)号：WO2017083695A1

公开(公告)日：2017-05-18

申请号：PCT/US2016/061597

申请日：2016-11-11

Applicant: GOOGLE INC.

Inventor： JAITLY, Navdeep , LE, QUOC V. , VINYALS, Oriol , BENGIO, Samuel , SUTSKEVER, Ilya

IPC: G10L15/16 , G05B13/02

CPC classification number: G10L15/16 , G05B13/027 , G06F17/276 , G06F17/289 , G06N3/0445 , G10L15/02 , G10L15/26 , G10L2015/025

Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.

Abstract translation: 系统可以被配置为执行诸如将记录的语音转换为表示语音的音素序列的任务，将输入的字素序列转换为音素的目标序列，将输入的词序列以一种语言转换成另一种语言的相应词语序列，或者预测在语言（例如，语言模型）中的词语的输入序列之后的词语的目标序列。在语音识别器中，RNN系统可被用于实时地将语音转换为目标音素序列，使得即使在用户已经完成了整个讲话的讲话之前，语音的转录可被生成并呈现给用户输入

7.

发明申请
DETECTING ACTIONABLE ITEMS IN A CONVERSATION AMONG PARTICIPANTS 审中-公开
Title translation: 在参与者对话中检测可处理的项目

公开(公告)号：WO2017053208A1

公开(公告)日：2017-03-30

申请号：PCT/US2016/052362

申请日：2016-09-17

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor： HAKKANI-TUR, Dilek Zeynep , HE, Xiaodong , CHEN, Yun-Nung

IPC: G10L15/22 , G06F17/27 , G10L15/26

CPC classification number: G10L15/16 , G06F17/2785 , G06F17/279 , G06N99/005 , G06Q10/10 , G10L15/20 , G10L15/22 , G10L15/26

Abstract: A computer-implemented technique is described herein for detecting actionable items in speech. In one manner of operation, the technique entails: receiving utterance information that expresses at least one utterance made by one participant of a conversation to at least one other participant of the conversation; converting the utterance information into recognized speech information; using a machine-trained model to recognize at least one actionable item associated with the recognized speech information; and performing at least one computer-implemented action associated the actionable item(s).The machine-trained model may correspond to a deep-structured convolutional neural network. In some implementations, the technique produces the machine-trained model using a source environment corpus that is not optimally suited for a target environment in which the model is intended to be applied. The technique further provides various adaptation techniques for adapting a source-environment model so that it better suits the target environment.

Abstract translation: 本文描述了一种用于检测语音中的可操作项目的计算机实现的技术。在一种操作方式中，技术包括：接收表达至少一个对话参与者的至少一个话语的话语信息给对话的至少一个其他参与者; 将话语信息转换成识别的语音信息; 使用机器训练的模型来识别与所识别的语音信息相关联的至少一个可操作项目; 并且执行与可操作项目相关联的至少一个计算机实现的动作。机器训练的模型可以对应于深层结构的卷积神经网络。在一些实现中，该技术使用源环境语料库产生机器训练的模型，该源环境语料库不是最适合于其中应用模型的目标环境。该技术进一步提供了各种适应技术，以适应源 - 环境模型，使其更适合于目标环境。

8.

发明申请
Automated Compilation of Probabilistic Task Description into Executable Neural Network Specification 审中-公开
Title translation: 概率任务描述自动编译为可执行神经网络规范

公开(公告)号：WO2016145379A1

公开(公告)日：2016-09-15

申请号：PCT/US2016/022127

申请日：2016-03-11

Applicant: WILLIAM MARSH RICE UNIVERSITY

Inventor： PATEL, Ankit B. , BARANIUK, Richard G.

IPC: G06F19/00 , G06N3/02 , G10L15/14

CPC classification number: G06N3/0472 , G06K9/6256 , G06N3/08 , G10L15/063 , G10L15/16

Abstract: A mechanism for compiling a generative description of an inference task into a neural network. First, an arbitrary generative probabilistic model from the exponential family is specified (or received). The model characterizes a conditional probability distribution for measurement data given a set of latent variables. A factor graph is generated for the generative probabilistic model. Each factor node of the factor graph is expanded into a corresponding sequence of arithmetic operations, based on a specified inference task and a kind of message passing algorithm. The factor graph and the sequences of arithmetic operations specify the structure of a neural network for performance of the inference task. A learning algorithm is executed, to determine values of parameters of the neural network. The neural network is then ready for performing inference on operational measurements.

Abstract translation: 将推理任务的生成描述编译成神经网络的机制。首先，指定（或接收）来自指数族的任意生成概率模型。该模型表征给定一组潜在变量的测量数据的条件概率分布。生成概率模型生成因子图。基于指定的推理任务和一种消息传递算法，将因子图的每个因子节点扩展为相应的算术运算序列。因子图和算术运算序列指定用于推理任务执行的神经网络的结构。执行学习算法，以确定神经网络的参数值。然后，神经网络准备好对操作测量进行推理。

9.

发明申请
SYSTEMS AND METHODS FOR CONTACTLESS SPEECH RECOGNITION 审中-公开
Title translation: 用于联系语音识别的系统和方法

公开(公告)号：WO2015101466A1

公开(公告)日：2015-07-09

申请号：PCT/EP2014/077156

申请日：2014-12-10

Applicant: ALCATEL LUCENT

Inventor： FUCHS, Rolf

IPC: G06K9/00 , G10L15/25

CPC classification number: G06K9/00335 , G10L15/02 , G10L15/142 , G10L15/16 , G10L15/25

Abstract: Systems and methods for contactless speech recognition using lip-reading are provided. In various aspects, a speech recognition unit (112) is configured to receive, via a receiver (108), a Doppler broadened reflected electromagnetic signal that has been modulated and reflected by the lip and facial movements of a speaking subject (104) and to output recognized speech based on an analysis of the received reflected signal. In one embodiment,the functionality of speech recognition unit (112) is implemented via a preprocessing unit (202), a Neural Network ("NNet") unit (204), and a Hidden Markov Model ("HMM") unit (206).

Abstract translation: 提供了使用唇读式进行非接触式语音识别的系统和方法。在各个方面，语音识别单元（112）被配置为经由接收机（108）接收由唇部进行调制和反射的多普勒加宽的反射电磁信号，以及讲话对象（104）的面部运动，以及基于对所接收的反射信号的分析来输出识别的语音。在一个实施例中，语音识别单元（112）的功能通过预处理单元（202），神经网络（NNet）单元（204）和隐马尔可夫模型（HMM））单元（206）来实现，。

10.

发明申请
SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING 审中-公开
Title translation: 具有多方向解码的语音识别器

公开(公告)号：WO2015047815A1

公开(公告)日：2015-04-02

申请号：PCT/US2014/056022

申请日：2014-09-17

Applicant: AMAZON TECHNOLOGIES, INC. , BISANI, Michael Maximilian Emanuel , STROM, Nikko , HOFFMEISTER, Bjorn , THOMAS, Ryan Paul

Inventor： BISANI, Michael Maximilian Emanuel , STROM, Nikko , HOFFMEISTER, Bjorn , THOMAS, Ryan Paul

IPC: G10L15/16

CPC classification number: G10L15/32 , G10L15/01 , G10L15/08 , G10L15/16 , G10L21/0272 , G10L25/78 , G10L2021/02166 , H04R1/406 , H04R3/005 , H04R2201/401 , H04R2410/01 , H04R2430/21

Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

Abstract translation: 在自动语音识别（ASR）处理系统中，ASR处理可以被配置为基于从波束形成器接收的多个音频信道来处理语音。 ASR处理系统可以包括麦克风阵列和波束形成器以输出多个音频通道，使得每个通道在特定方向上隔离音频。多声道音频信号可以包括来自一个或多个扬声器的说话话音/语音以及不期望的音频，例如来自家用电器的噪声。 ASR设备可以同时对多声道音频执行语音识别，以提供更准确的语音识别结果。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification