Voice processing device, voice processing method, and non-transitory recording medium that stores program

    公开(公告)号:US10037759B2

    公开(公告)日:2018-07-31

    申请号:US14251201

    申请日:2014-04-11

    发明人: Hiroyasu Ide

    IPC分类号: G10L17/14

    CPC分类号: G10L17/14

    摘要: A voice processing device includes: an acquirer which acquires feature quantities of vowel sections included in voice data; a classifier which classifies, among the acquired feature quantities, feature quantities corresponding to a plurality of same vowels into a plurality of clusters for respective vowels with unsupervised classification; and a determiner which determines a combination of clusters corresponding to the same speaker from clusters classified for the plurality of vowels.

    Content-aware speaker recognition
    4.
    发明授权
    Content-aware speaker recognition 有权
    内容感知扬声器识别

    公开(公告)号:US09336781B2

    公开(公告)日:2016-05-10

    申请号:US14264916

    申请日:2014-04-29

    申请人: SRI International

    CPC分类号: G10L17/14

    摘要: A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition.

    摘要翻译: 内容感知扬声器识别系统包括用于分析语音样本的语音内容的技术,将语音样本的语音内容结合到扬声器模型中,并使用语音识别扬声器模型来进行说话者识别。

    Fast, language-independent method for user authentication by voice
    5.
    发明授权
    Fast, language-independent method for user authentication by voice 有权
    快速,语言独立的方法,用于通过语音进行用户认证

    公开(公告)号:US09218809B2

    公开(公告)日:2015-12-22

    申请号:US14151605

    申请日:2014-01-09

    申请人: Apple Inc.

    IPC分类号: G10L15/07 G10L17/04

    摘要: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.

    摘要翻译: 描述了通过语音信号训练用户认证的方法和系统。 在一个实施例中,一组特征向量被分解成说话者特定的识别单元。 扬声器特定识别单元用于计算分配值以训练语音信号。 此外,频谱特征向量被分解成与特定于扬声器的分布值相比较的扬声器特定特征单元。 如果扬声器特有特征单元在扬声器特定分布值的阈值限度内,则认证语音信号。

    System and method for identification of a speaker by phonograms of spontaneous oral speech and by using formant equalization using one vowel phoneme type
    6.
    发明授权
    System and method for identification of a speaker by phonograms of spontaneous oral speech and by using formant equalization using one vowel phoneme type 有权
    通过自发口头语音的录音识别扬声器的系统和方法,以及使用一个元音音素类型的共振峰均衡

    公开(公告)号:US09047866B2

    公开(公告)日:2015-06-02

    申请号:US13429260

    申请日:2012-03-23

    摘要: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies; wherein the first two formants are within typical variability limits for one vowel phoneme type. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.

    摘要翻译: 公开了一种用口头语音录音来识别扬声器的系统和方法。 通过在语音信号的参考语音中匹配共振峰频率来评估扬声器的第一录音和第二或取样的录音之间的相似性,其中用于比较的话语从第一录音和第二录音中选择。 语音信号的参考语音从第一录音和第二录音中选择,其中参考语音包括至少三个共振峰频率的共振峰路径; 其中前两个共振峰在一个元音音素类型的典型变异范围内。 在其间比较包括至少两个相同共振峰频率的所选参考语音。 评估与匹配其他共振峰频率的比较参考语音的相似性,其中从所有比较参考语音的相似性的评估确定录音的相似性。

    System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring

    公开(公告)号:US20140358540A1

    公开(公告)日:2014-12-04

    申请号:US14459696

    申请日:2014-08-14

    IPC分类号: G10L15/07 G10L15/06

    摘要: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

    SPEAKER VERIFICATION AND IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK-BASED SUB-PHONETIC UNIT DISCRIMINATION
    8.
    发明申请
    SPEAKER VERIFICATION AND IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK-BASED SUB-PHONETIC UNIT DISCRIMINATION 有权
    使用基于人工神经网络的子电话歧视的扬声器验证和识别

    公开(公告)号:US20140195236A1

    公开(公告)日:2014-07-10

    申请号:US13738868

    申请日:2013-01-10

    IPC分类号: G10L17/14

    摘要: In one embodiment, a computer system stores speech data for a plurality of speakers, where the speech data includes a plurality of feature vectors and, for each feature vector, an associated sub-phonetic class. The computer system then builds, based on the speech data, an artificial neural network (ANN) for modeling speech of a target speaker in the plurality of speakers, where the ANN is configured to discriminate between instances of sub-phonetic classes uttered by the target speaker and instances of sub-phonetic classes uttered by other speakers in the plurality of speakers.

    摘要翻译: 在一个实施例中,计算机系统存储用于多个扬声器的语音数据,其中语音数据包括多个特征向量,并且对于每个特征向量,存储相关联的子语音类。 计算机系统然后基于语音数据建立用于对多个扬声器中的目标扬声器的语音进行建模的人造神经网络(ANN),其中ANN被配置为区分由目标发出的子语音类别的实例 扬声器和多个扬声器中的其他扬声器发出的子语音类的实例。

    APPARATUS, SYSTEM AND METHOD FOR CALCULATING PASSPHRASE VARIABILITY
    9.
    发明申请
    APPARATUS, SYSTEM AND METHOD FOR CALCULATING PASSPHRASE VARIABILITY 审中-公开
    装置,系统和计算可变性的方法

    公开(公告)号:US20140188468A1

    公开(公告)日:2014-07-03

    申请号:US13729127

    申请日:2012-12-28

    IPC分类号: G10L17/00

    CPC分类号: G10L17/04 G10L17/14 G10L17/24

    摘要: An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system.

    摘要翻译: 公开了用于计算密码短语变异性的装置,系统和方法。 然后,密码可变性值可以用于在文本相关的说话人识别系统中产生语音丰富的密码,或用于在语音识别安全系统中的注册过程期间估计文本无关系统中的输入密码短语的可变性。

    Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
    10.
    发明授权
    Method and system for using conversational biometrics and speaker identification/verification to filter voice streams 有权
    使用会话生物识别和扬声器识别/验证来过滤语音流的方法和系统

    公开(公告)号:US08537978B2

    公开(公告)日:2013-09-17

    申请号:US12246056

    申请日:2008-10-06

    IPC分类号: H04M1/64

    摘要: A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive an audio stream of a communication between a plurality of participants. Additionally, the programming instructions are operable to filter the audio stream of the communication into separate audio streams, one for each of the plurality of participants, wherein each of the separate audio streams contains portions of the communication attributable to a respective participant of the plurality of participants. Furthermore, the programming instructions are operable to output the separate audio streams to a storage system.

    摘要翻译: 一种在具有计算机可执行代码的计算机基础结构中实现的方法,其具有有形地体现在计算机可读存储介质上的编 编程指令可操作以接收多个参与者之间的通信的音频流。 另外,编程指令可操作用于将通信的音频流过滤成单独的音频流,对于多个参与者中的每个参与者为一个音频流,其中每个单独的音频流包含归属于多个参与者的相应参与者的通信部分 参与者 此外,编程指令可操作以将单独的音频流输出到存储系统。