RECURRENT NEURAL NETWORK LEARNING METHOD, COMPUTER PROGRAM FOR SAME, AND VOICE RECOGNITION DEVICE
    61.
    发明公开
    RECURRENT NEURAL NETWORK LEARNING METHOD, COMPUTER PROGRAM FOR SAME, AND VOICE RECOGNITION DEVICE 审中-公开
    再现的神经网络学习方法,计算机程序及语音识别装置

    公开(公告)号:EP3296930A1

    公开(公告)日:2018-03-21

    申请号:EP16792676.5

    申请日:2016-05-10

    发明人: KANDA, Naoyuki

    摘要: [Object]
    An object is to provide a training method of improving training of a recurrent neural network (RNN) using time-sequential data.
    [Solution]
    The training method includes a step 220 of initializing the RNN, and a training step 226 of training the RNN by designating a certain vector as a start position and optimizing various parameters to minimize error function. The training step 226 includes: an updating step 250 of updating RNN parameters through Truncated BPTT using consecutive N (N≥3) vectors having a designated vector as a start point and using a reference value of a tail vector as a correct label; and a first repetition step 240 of repeating the process of executing the training step by newly designating a vector at a position satisfying a prescribed relation with the tail of N vectors used at the updating step until an end condition is satisfied. The vector at a position satisfying the prescribed relation is positioned at least two vectors behind the designated vector.

    摘要翻译: 本发明的目的是提供一种使用时间序列数据改进递归神经网络(RNN)的训练的训练方法。 解决方案训练方法包括初始化RNN的步骤220和训练步骤226,通过指定某个向量作为开始位置并优化各种参数以最小化误差函数来训练RNN。 训练步骤226包括:更新步骤250,使用具有指定向量的连续N(N≥3)个向量作为起点并使用尾向量的参考值作为正确标签,通过截断BPTT更新RNN参数; 以及第一重复步骤240,通过在满足与在更新步骤中使用的N个向量的尾部的规定关系的位置重新指定向量直到满足结束条件来重复执行训练步骤的处理。 满足规定关系的位置处的矢量位于指定矢量后面的至少两个矢量中。

    AUTOMATIC ACCENT DETECTION
    62.
    发明公开

    公开(公告)号:EP3286756A1

    公开(公告)日:2018-02-28

    申请号:EP15784191

    申请日:2015-09-30

    申请人: APPLE INC

    IPC分类号: G10L15/06 G10L25/51

    摘要: Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.

    OBFUSCATING TRAINING DATA
    63.
    发明公开
    OBFUSCATING TRAINING DATA 审中-公开
    培训数据

    公开(公告)号:EP3262634A1

    公开(公告)日:2018-01-03

    申请号:EP15710122.1

    申请日:2015-02-26

    申请人: Longsand Limited

    IPC分类号: G10L15/06

    摘要: Examples disclosed herein involve obfuscating training data. An example method includes computing a sequence of acoustic features from audio data of training data, the training data comprising the audio data and a corresponding text transcript; mapping the acoustic features to acoustic model states to generate annotated feature vectors, the annotated feature vectors comprising the acoustic features and corresponding context from the text transcript; and providing a randomized sequence of the annotated feature vectors as obfuscated training data to an audio analysis system.

    MIXED SPEECH SIGNALS RECOGNITION
    65.
    发明授权
    MIXED SPEECH SIGNALS RECOGNITION 有权
    混合语音信号识别

    公开(公告)号:EP3123466B1

    公开(公告)日:2017-11-15

    申请号:EP15714120.1

    申请日:2015-03-19

    IPC分类号: G10L15/16 G10L15/06 G10L15/20

    摘要: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

    TRAINING METHOD AND APPARATUS FOR LANGUAGE MODEL, AND DEVICE
    67.
    发明公开
    TRAINING METHOD AND APPARATUS FOR LANGUAGE MODEL, AND DEVICE 审中-公开
    SPRACHMODELLTRAININGSVERFAHREN SOWIE VORRICHTUNG UND EINRICHUNG

    公开(公告)号:EP3179473A4

    公开(公告)日:2017-07-12

    申请号:EP16762948

    申请日:2016-06-06

    发明人: YAN ZHIYONG

    IPC分类号: G10L15/183 G10L15/06

    摘要: The present disclosure provides a language model training method and apparatus and a device. The method includes: obtaining a universal language model in an offline training mode, and clipping the universal language model to obtain a clipped language model; obtaining a log language model of logs within a preset time period in an online training mode; fusing the clipped language model with the log language model to obtain a first fusion language model used for carrying out first time decoding; and fusing the universal language model with the log language model to obtain a second fusion language model used for carrying out second time decoding. The method is used for solving the problem that a language model obtained offline in the prior art has poor coverage on new corpora, resulting in a reduced language recognition rate.

    摘要翻译: 本公开提供了一种语言模型训练方法和装置及设备。 该方法包括:在离线训练模式下获得通用语言模型,裁剪通用语言模型以获得裁剪语言模型; 在在线训练模式下获取预设时间段内的日志的日志语​​言模型; 将剪裁的语言模型与日志语言模型融合以获得用于执行第一次解码的第一融合语言模型; 以及将通用语言模型与日志语言模型融合以获得用于执行第二次解码的第二融合语言模型。 该方法用于解决现有技术离线获取的语言模型对新语料的覆盖率较差,导致语言识别率降低的问题。

    AUTHENTICATION METHOD, TERMINAL AND COMPUTER STORAGE MEDIUM BASED ON VOICEPRINT CHARACTERISTIC
    68.
    发明公开
    AUTHENTICATION METHOD, TERMINAL AND COMPUTER STORAGE MEDIUM BASED ON VOICEPRINT CHARACTERISTIC 审中-公开
    基于语音特征的认证方法,终端和计算机存储介质

    公开(公告)号:EP3185162A1

    公开(公告)日:2017-06-28

    申请号:EP15833692.5

    申请日:2015-04-28

    申请人: ZTE Corporation

    发明人: LIU, Xueqin

    IPC分类号: G06F21/32 G10L15/02 G10L15/06

    CPC分类号: G06F21/32 G10L17/04

    摘要: A secure authentication method based on a voiceprint characteristic, the method comprising: upon receiving a voice acquisition instruction, a terminal acquires to-be-measured voice data recorded by a user; extracting a voiceprint characteristic of the to-be-measured voice data to obtain voiceprint characteristic information; and according to the currently extracted voiceprint characteristic information and pre-stored voiceprint characteristic information, authenticating the identity of the current user. Also disclosed are a corresponding terminal and computer storage medium.

    摘要翻译: 一种基于声纹特征的安全认证方法,所述方法包括:终端接收到语音获取指令时,获取用户记录的待测量语音数据; 提取待测量语音数据的声纹特征,得到声纹特征信息; 根据当前提取的声纹特征信息和预先存储的声纹特征信息,对当前用户的身份进行认证。 还公开了相应的终端和计算机存储介质。

    SESSION CONTEXT MODELING FOR CONVERSATIONAL UNDERSTANDING SYSTEMS
    69.
    发明公开
    SESSION CONTEXT MODELING FOR CONVERSATIONAL UNDERSTANDING SYSTEMS 有权
    对话理解系统的会话上下文建模

    公开(公告)号:EP3158559A1

    公开(公告)日:2017-04-26

    申请号:EP15736702.0

    申请日:2015-06-17

    摘要: Systems and methods are provided for improving language models for speech recognition by adapting knowledge sources utilized by the language models to session contexts. A knowledge source, such as a knowledge graph, is used to capture and model dynamic session context based on user interaction information from usage history, such as session logs, that is mapped to the knowledge source. From sequences of user interactions, higher level intent sequences may be determined and used to form models that anticipate similar intents but with different arguments including arguments that do not necessarily appear in the usage history. In this way, the session context models may be used to determine likely next interactions or “turns” from a user, given a previous turn or turns. Language models corresponding to the likely next turns are then interpolated and provided to improve recognition accuracy of the next turn received from the user.

    摘要翻译: 通过将语言模型使用的知识源适配到会话上下文中,提供了用于改善语音识别的语言模型的系统和方法。 知识源(例如知识图)用于基于映射到知识源的使用历史记录(例如会话日志)中的用户交互信息来捕获和建模动态会话上下文。 根据用户交互的序列,可以确定更高级别的意图序列并且将其用于形成预测类似意图但具有不同参数的模型,所述参数包括不一定出现在使用历史中的参数。 以这种方式,会话上下文模型可以被用于确定可能的下一个交互或者在给定之前的转向或转向时从用户“转向”。 然后插入并提供对应于可能的下一个回合的语言模型,以提高从用户接收到的下一个回合的识别准确度。

    KNOWLEDGE SOURCE PERSONALIZATION TO IMPROVE LANGUAGE MODELS
    70.
    发明公开
    KNOWLEDGE SOURCE PERSONALIZATION TO IMPROVE LANGUAGE MODELS 审中-公开
    WISSENSQUELLENPERSONALISIERUNG ZUR VERBESSERUNG VON SPRACHMODELLEN

    公开(公告)号:EP3143522A1

    公开(公告)日:2017-03-22

    申请号:EP15728256.7

    申请日:2015-05-15

    IPC分类号: G06F17/30 G10L15/06

    摘要: Systems and methods are provided for improving language models for speech recognition by personalizing knowledge sources utilized by the language models to specific users or user-population characteristics. A knowledge source, such as a knowledge graph, is personalized for a particular user by mapping entities or user actions from usage history for the user, such as query logs, to the knowledge source. The personalized knowledge source may be used to build a personal language model by training a language model with queries corresponding to entities or entity pairs that appear in usage history. In some embodiments, a personalized knowledge source for a specific user can be extended based on personalized knowledge sources of similar users.

    摘要翻译: 提供了系统和方法,用于通过将语言模型所使用的知识源个人化为特定用户或用户群体特征来改进用于语音识别的语言模型。 通过将实体或用户操作与用户的使用历史(例如查询日志)映射到知识源,为特定用户个性化知识源。 个性化知识源可以用于通过训练具有对应于出现在使用历史中的实体或实体对的查询的语言模型来构建个人语言模型。 在一些实施例中,可以基于类似用户的个性化知识源来扩展用于特定用户的个性化知识源。