SPEAKER AUTHENTICATION
    1.
    发明申请
    SPEAKER AUTHENTICATION 审中-公开
    演讲者认证

    公开(公告)号:WO2007098039A1

    公开(公告)日:2007-08-30

    申请号:PCT/US2007/004137

    申请日:2007-02-13

    CPC classification number: G10L17/20 G10L17/08

    Abstract: Speaker authentication is performed by determining a similarity score for a test utterance and a stored training utterance. Computing the similarity score involves determining the sum of a group of functions, where each function includes the product of a posterior probability of a mixture component and a difference between an adapted mean and a background mean. The adapted mean is formed based on the background mean and the test utterance. The speech content provided by the speaker for authentication can be text-independent (i.e., any content they want to say) or text-dependent (i.e., a particular phrase used for training).

    Abstract translation: 通过确定测试话语和存储的训练话语的相似性得分来执行扬声器认证。 计算相似度分数涉及确定一组函数的和,其中每个函数包括混合分量的后验概率与适应平均值与背景平均值之间的差的乘积。 适应平均值是基于背景均值和测试语音形成的。 用于认证的说话者提供的语音内容可以是文本无关的(即,他们想说的任何内容)或文本依赖(即,用于训练的特定短语)。

    ">
    2.
    发明申请
    "SPEAKER RECOGNITION SYSTEMS" 审中-公开
    “扬声器识别系统”

    公开(公告)号:WO2002103680A2

    公开(公告)日:2002-12-27

    申请号:PCT/GB2002/002726

    申请日:2002-06-13

    CPC classification number: G10L17/02 G10L17/12 G10L17/20

    Abstract: Speaker recognition (identification and/or verification) methods and systems, in which speech models for enrolled speakers consist of sets of feature vectors representing the smoothed frequency spectrum of each of a plurality of frames and a clustering algorithm is applied to the feature vectors of the frames to obtain a reduced data set representing the original speech sample, and wherein the adjacent frames are overlapped by at least 80 %. Speech models of this type model the static components of the speech sample and exhibit temporal independence. An identifier strategy is employed in which modelling and classification processes are selected to give a false rejection rate substantially equal to zero. Each enrolled speaker is associated with a cohort of a predetermined number of other enrolled speakers and a test sample is always matched with either the claimed identity or one of its associated cohort. This makes the overall error rate of the system dependent only on the false acceptance rate, which is determined by the cohort size. The false error rate is further reduced by use of multiple parallel modelling and/or classification processes. Speech models are normalised prior to classification using a normalisation model derived from either the test speech sample or one of the enrolled speaker samples (most preferably from the claimed identity enrolment sample).

    Abstract translation: 扬声器识别(识别和/或验证)方法和系统,其中登记的扬声器的语音模型由表示多个帧中的每一个的平滑频谱的特征向量集合和聚类算法应用于 帧以获得表示原始语音样本的缩减数据集,并且其中相邻帧重叠至少80%。 这种类型的语音模型模拟语音样本的静态组件并呈现时间独立性。 采用标识符策略,其中选择建模和分类处理以给出基本等于零的错误拒绝率。 每个登记的说话者与预定数量的其他注册的发言人的队列相关联,并且测试样本总是与所要求保护的身份或其相关联的队列中的一个匹配。 这使得系统的总体错误率仅取决于由队列大小确定的错误接受率。 通过使用多个并行建模和/或分类过程进一步降低了错误错误率。 语音模型在使用从测试语音样本或所登记的说话者样本(最优选来自所要求的身份登记样本)导出的归一化模型之前进行归一化。

    CHANNEL ESTIMATION SYSTEM AND METHOD FOR USE IN AUTOMATIC SPEAKER VERIFICATION SYSTEMS
    3.
    发明申请
    CHANNEL ESTIMATION SYSTEM AND METHOD FOR USE IN AUTOMATIC SPEAKER VERIFICATION SYSTEMS 审中-公开
    用于自动语音识别系统的信道估计系统和方法

    公开(公告)号:WO99059136A1

    公开(公告)日:1999-11-18

    申请号:PCT/US1999/010038

    申请日:1999-05-07

    CPC classification number: G10L17/20 G10L15/063 G10L17/04 G10L21/02

    Abstract: The voice print system of the present invention concerns an automatic speaker verification (ASV) system that is subword-based and text-dependent with no constraints on the choice of vocabulary words or language. One component of the preferred ASV system is a channel estimation and normalization component that is able to remove the characteristics of the test channel component (150) and/or enrollment channel component (90) to increase accuracy. The preferred methods and systems of the present invention termed Curve-Fitting (62, 64, 66) and Clean Speech (82, 86, 88, 90, 92), separately, together, and in combination with Pole filtering (42, 44, 46), significantly improve the existing methods of channel estimation and normalization. Unlike Cepstral Mean Subtraction, both Curve-Fitting (62, 64, 66) and Clean Speech (42, 44, 46) methods and systems extract only the channel related information from the cepstral mean and not any speech information.

    Abstract translation: 本发明的语音打印系统涉及一种自动说话人验证(ASV)系统,它是基于词语和文本依赖的,而不限制词汇词或语言的选择。 优选的ASV系统的一个组件是能够去除测试信道组件(150)和/或注册信道组件(90)的特性以增加精度的信道估计和归一化组件。 本发明的优选方法和系统分别称为曲线拟合(62,64,66)和清洁语音(82,86,88,90,92),一起并结合极点滤波(42,44,46) 46),显着提高了信道估计和规范化的现有方法。 与倒谱平均减法不同,曲线拟合(62,64,66)和清晰语音(42,44,46)的方法和系统仅从倒谱平均值提取信道相关信息,而不是任何语音信息。

    SPEAKER VERIFICATION SYSTEM
    4.
    发明申请
    SPEAKER VERIFICATION SYSTEM 审中-公开
    扬声器验证系统

    公开(公告)号:WO1996041334A1

    公开(公告)日:1996-12-19

    申请号:PCT/US1996009260

    申请日:1996-06-06

    Abstract: The present invention relates to a pattern recognition system (Fig. 1) which uses data fusion to combine data from a plurality of extracted features (60, 61, 62) and a plurality of classifiers (70, 71, 72). Speaker patterns can be accurately verified with the combination of discriminant based and distortion based classifiers. A novel approach using a training set of a "leave one out" data can be used for training the system with a reduced data set (Figs. 7A, 7B, 7C). Extracted features can be improved with a pole filtered method for reducing channel effects (Fig. 11B) and an affine transformation for improving the correlation between training and testing data (Fig. 14).

    Abstract translation: 本发明涉及使用数据融合来组合来自多个提取特征(60,61,62)和多个分类器(70,71,72)的数据的模式识别系统(图1)。 可以使用基于判别式和基于失真的分类器的组合来准确地验证扬声器模式。 可以使用使用“保留一个”数据的训练集的新颖方法来用减少的数据集来训练系统(图7A,7B,7C)。 通过用于减少信道效应的极点滤波方法(图11B)和用于改善训练和测试数据之间的相关性的仿射变换(图14)可以提高提取的特征。

    DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION
    5.
    发明申请
    DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION 审中-公开
    用于演讲者验证的动态阈值

    公开(公告)号:WO2015199813A1

    公开(公告)日:2015-12-30

    申请号:PCT/US2015/028859

    申请日:2015-05-01

    Applicant: GOOGLE INC.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a dynamic threshold for speaker verification are disclosed. In one aspect, a method includes the actions of receiving, for each of multiple utterances of a hotword, a data set including at least a speaker verification confidence score, and environmental context data. The actions further include selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context. The actions further include selecting a particular data set from among the subset of data sets based on one or more selection criteria. The actions further include selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score. The actions further include providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.

    Abstract translation: 公开了用于说话人验证的动态阈值的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。 在一个方面,一种方法包括针对热词的多个话语中的每一个接收包括至少说话人验证置信度得分和环境上下文数据的数据集的动作。 动作还包括从数据集中选择与特定环境上下文相关联的数据集的子集。 动作还包括基于一个或多个选择标准从数据集的子集中选择特定数据集。 该动作进一步包括作为特定环境背景的说话者验证阈值来选择说话者验证置信度得分。 该动作进一步包括提供说话者验证阈值,以用于执行与特定环境背景相关联的话语的说话者验证。

    METHOD AND APPARATUS FOR ADJUSTING VOICE RECOGNITION PROCESSING BASED ON NOISE CHARACTERISTICS
    6.
    发明申请
    METHOD AND APPARATUS FOR ADJUSTING VOICE RECOGNITION PROCESSING BASED ON NOISE CHARACTERISTICS 审中-公开
    基于噪声特征调整语音识别处理的方法与装置

    公开(公告)号:WO2015017303A1

    公开(公告)日:2015-02-05

    申请号:PCT/US2014/048354

    申请日:2014-07-28

    CPC classification number: G10L15/20 G10L15/065 G10L17/20 G10L21/0208 G10L25/48

    Abstract: A method and apparatus for adjusting a trigger parameter related to voice recognition processing includes receiving into the device an acoustic signal comprising a speech signal, which is provided to a voice recognition module, and comprising noise. The method further includes determining a noise profile for the acoustic signal, wherein the noise profile identifies a noise level for the noise and identifies a noise type for the noise based on a frequency spectrum for the noise, and adjusting the voice recognition module based on the noise profile by adjusting a trigger parameter related to voice recognition processing.

    Abstract translation: 一种用于调整与语音识别处理相关的触发参数的方法和装置,包括:将包括提供给语音识别模块的语音信号和包含噪声的声音信号接收到该设备中。 所述方法还包括确定所述声信号的噪声分布,其中所述噪声分布识别所述噪声的噪声水平,并且基于所述噪声的频谱识别所述噪声的噪声类型,以及基于所述噪声分布调整所述语音识别模块 通过调整与语音识别处理相关的触发参数来进行噪声分析。

    A METHOD FOR SPEECH WATERMARKING IN SPEAKER VERIFICATION
    7.
    发明申请
    A METHOD FOR SPEECH WATERMARKING IN SPEAKER VERIFICATION 审中-公开
    用于语音识别的语音识别方法

    公开(公告)号:WO2015012680A2

    公开(公告)日:2015-01-29

    申请号:PCT/MY2014/000138

    申请日:2014-05-29

    CPC classification number: G10L19/018 G10L17/20 G10L25/78

    Abstract: The present invention relates toamethod forspeech watermarking inspeaker verification,comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver;characterisedby the steps of: selecting frameshavingleast speaker-specific information fromthe speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal;and embedding watermark data into the selected frames of the speech signal according to thepresence or absence of the speaker's voice.

    Abstract translation: 本发明涉及用于语音水印嵌入式验证的方法,包括以下步骤:在发射机处嵌入水印数据到语音信号中; 并从接收机的语音信号中提取水印数据;通过以下步骤表征:从语音信号中选择至少与扬声器有关的信息以携带水印数据; 检测语音活动以检测语音信号中的扬声器的语音的存在或不存在;以及根据说话者的语音的存在或不存在将水印数据嵌入到语音信号的所选择的帧中。

    SPEAKER VERIFICATION
    8.
    发明申请
    SPEAKER VERIFICATION 审中-公开
    扬声器验证

    公开(公告)号:WO2010049695A1

    公开(公告)日:2010-05-06

    申请号:PCT/GB2009/002579

    申请日:2009-10-29

    CPC classification number: G10L17/12 G10L17/20

    Abstract: A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background noise is used to bring the condition of the training data closer to that of the test voice sample by modifying the training data and a reduced set of the general data, before creating adapted training and general models. Match scores are generated based on the comparison between the adapted models and the test voice sample, with a final match score calculated based on the difference between the match scores. This final match score gives a measure of the degree of matching between the test voice sample and the training utterance and is based on the degree of matching between the speech characteristics from extracted feature vectors that make up the respective speech signals, and is not a direct comparison of the raw signals themselves. Thus, the method can be used to verify a speaker without necessarily requiring the speaker to provide an identical test phrase to the phrase provided in the training sample.

    Abstract translation: 提出了一种说话人验证方法,其首先使用一组一般训练语音数据构建用户话语的一般模型。 用户还通过提供训练话语来训练系统,例如口令或其他口语说话。 然后在测试阶段,用户提供测试话语,其包括一些背景噪声以及测试语音样本。 背景噪声用于在创建适应的训练和一般模型之前,通过修改训练数据和减少的一般数据集,使训练数据的状况更接近于测试语音样本的状态。 基于适应模型和测试语音样本之间的比较产生匹配分数,根据匹配分数之间的差异计算最终匹配分数。 该最终匹配分数给出测试语音样本和训练话语之间的匹配程度的度量,并且基于来自提取的组成各个语音信号的特征向量的语音特征之间的匹配程度,并且不是直接的 原始信号本身的比较。 因此,该方法可用于验证扬声器,而不一定要求扬声器为训练样本中提供的短语提供相同的测试短语。

    METHOD AND SYSTEM FOR ESTABLISHING HANDSET-DEPENDENT NORMALIZING MODELS FOR SPEAKER RECOGNITION
    9.
    发明申请
    METHOD AND SYSTEM FOR ESTABLISHING HANDSET-DEPENDENT NORMALIZING MODELS FOR SPEAKER RECOGNITION 审中-公开
    用于建立用于语音识别的手机相关正规化模型的方法和系统

    公开(公告)号:WO98038632A1

    公开(公告)日:1998-09-03

    申请号:PCT/US1998/003750

    申请日:1998-02-24

    CPC classification number: G10L17/20 G10L15/20 G10L17/00

    Abstract: A method and apparatus is provided for establishing a normalizing model suitable for use with a speaker model to normalize the speaker model, the speaker model for modelling voice characteristics of a specific individual, the speaker model and the normalizing model for use in recognizing identity of a speaker. A normalizer module (231) within a scoring module (215) uses the normalizing score (229) to normalize the speaker score (225) thereby obtaining a normalized speaker score (217). Based on the normalized speaker score (217), a decision module (219) makes a decision (221) of whether to believe that the test speaker (203), whose utterance was the source of the speech data (213), is the reference speaker (403).

    Abstract translation: 提供了一种方法和装置,用于建立适合于与扬声器模型一起使用以使说话者模型正常化的规范化模型,用于建模特定个人的语音特征的扬声器模型,说话者模型和用于识别身份的标准化模型 扬声器。 评分模块(215)内的归一化模块(231)使用归一化分数(229)来标准化说话人得分(225),从而获得归一化的说话者得分(217)。 基于归一化的说话者得分(217),判定模块(219)作出判定(221)是否相信其话语是语音数据(213)的来源的测试说话者(203)是参考 扬声器(403)。

    声纹识别方法、装置、电子设备及介质

    公开(公告)号:WO2018107810A1

    公开(公告)日:2018-06-21

    申请号:PCT/CN2017/099707

    申请日:2017-08-30

    Inventor: 王健宗 郭卉 肖京

    Abstract: 提供了适用于身份认证技术领域的声纹识别方法、装置、电子设备及介质。方法包括:对输入的语音进行预处理,获取语音中的有效语音;提取语音的MFCC声学特征,输出包含MFCC维度及语音分帧数的第一和第二特征矩阵;构建长短时递归神经网络模型,并将第一特征矩阵作为输入;利用神经网络模型的训练参数及语音的说话人特征训练特征提取矩阵,每个特征提取矩阵对应一个说话人模型;选取出匹配第二特征矩阵的说话人模型,匹配的说话人模型对应的说话人输出为声纹识别结果。能够从训练语音中挖掘出更合适的声学特征,从而能够更准确地辨别说话人的差异性特征,学习到鲁棒性更强的说话人模型,获取更好的声纹识别效果。

Patent Agency Ranking