Speaker verification and speaker identification based on a priori knowledge
    1.
    发明授权
    Speaker verification and speaker identification based on a priori knowledge 有权
    基于先验知识的扬声器验证和扬声器识别

    公开(公告)号:US06697778B1

    公开(公告)日:2004-02-24

    申请号:US09610495

    申请日:2000-07-05

    IPC分类号: G10L1506

    CPC分类号: G10L17/02

    摘要: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

    摘要翻译: 扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。 扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器,或者由训练和客户端扬声器组合构成。 还提供了基于客户端环境信息对扬声器空间的再估计,以提高客户端数据落入扬声器空间的可能性。 在将客户登记到扬声器空间中,当满足预定条件时,可以获得额外的客户端语音。 扬声器分配也可以在客户端注册步骤中使用。

    Speaker verification and speaker identification based on eigenvoices
    2.
    发明授权
    Speaker verification and speaker identification based on eigenvoices 失效
    基于特征语音的扬声器验证和扬声器识别

    公开(公告)号:US6141644A

    公开(公告)日:2000-10-31

    申请号:US148911

    申请日:1998-09-04

    CPC分类号: G10L17/02

    摘要: Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.

    摘要翻译: 语音模型根据已知的客户端扬声器的语音进行构建和训练(并且在演讲人验证的情况下也引用了演讲者)。 来自这些模型的参数被连接以定义超级向量,并且这些超向量的线性变换导致维度降低,产生称为本征空间的低维空间。 培训演讲者随后被表示为本土空间的分数或分布。 此后,来自测试扬声器的新的语音数据通过类似的线性变换被放置到本征空间中,并且测试扬声器的本征空间与训练扬声器的接近度用于认证或识别测试扬声器。

    Personalized agent for portable devices and cellular phone
    3.
    发明授权
    Personalized agent for portable devices and cellular phone 有权
    便携式设备和手机的个性化代理

    公开(公告)号:US06895257B2

    公开(公告)日:2005-05-17

    申请号:US10077904

    申请日:2002-02-18

    摘要: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.

    摘要翻译: 通过将语音转换为文本的语音识别器的服务和将文本转换为语音的文本到语音合成器,个性化代理服务被提供在诸如蜂窝电话或个人数字助理的个人消息设备中。 识别器和合成器可以是基于服务器的或本地部署在设备内。 用户指定一个电子邮件消息,转换为文本并存储。 存储的文本作为文本或合成语音发送回用户,以允许用户在作为电子邮件发送之前编辑消息并纠正转录错误。 该系统包括一个汇总模块,准备收到的电子邮件和语音邮件的简要摘要。 用户可以访问这些摘要,并使用语音命令检索和组织电子邮件和语音邮件。

    Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification
    4.
    发明授权
    Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification 有权
    用于语音识别,扬声器识别和说话人验证的声学模型的本征语重新估计技术

    公开(公告)号:US06895376B2

    公开(公告)日:2005-05-17

    申请号:US09849174

    申请日:2001-05-04

    IPC分类号: G10L15/06 G10L17/00

    CPC分类号: G10L15/07 G10L17/02

    摘要: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.

    摘要翻译: 在训练期间使用减小的维度本征语音分析技术来开发用于异音素的上下文相关的声学模型。 执行重新估计过程以更强烈地分离语音模型的与扬声器相关的和与扬声器无关的组件。 特定语音技术在运行时也用于新演讲者的演讲。 该技术可以消除单个扬声器的特性,从而产生更普遍适用和强大的异音模型。 在一个实施例中,本征语音技术用于识别每个说话者的质心,然后可以将其“减去”识别方程。

    Maximum likelihood method for finding an adapted speaker model in eigenvoice space
    5.
    发明授权
    Maximum likelihood method for finding an adapted speaker model in eigenvoice space 失效
    在本征语音空间中找到适应的说话者模型的最大似然法

    公开(公告)号:US06263309B1

    公开(公告)日:2001-07-17

    申请号:US09070054

    申请日:1998-04-30

    IPC分类号: G10L1508

    CPC分类号: G10L15/07

    摘要: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.

    摘要翻译: 一组扬声器依赖模型训练在相对较多数量的训练扬声器上,每个扬声器一个模型和模型参数以预定义的顺序提取,以构建一组超级矢量,每个扬声器一个。 然后在一组超级矢量上执行原理分量分析,以生成一组定义本征语音空间的特征向量。 如果需要,可以减少向量的数量以实现数据压缩。 此后,新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。 然后,可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数,从该模型参数构建适合于该说话者的适应模型。 可以通过在训练数据中包括环境变化来执行环境适应。

    Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques
    6.
    发明授权
    Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques 有权
    使用本征语音技术的扬声器归一化和扬声器和环境适应的尺寸减小

    公开(公告)号:US06343267B1

    公开(公告)日:2002-01-29

    申请号:US09148753

    申请日:1998-09-04

    IPC分类号: G10L1908

    CPC分类号: G06K9/6247 G10L15/07

    摘要: A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.

    摘要翻译: 一组扬声器依赖模型或适应模型被训练在相对较多数量的训练扬声器上,每个扬声器一个模型和模型参数以预定义的顺序被提取以构造一组超级矢量,每个扬声器一个。 然后对该一组超级矢量执行尺寸减小,以生成一组定义本征语音空间的特征向量。 如果需要,可以减少向量的数量以实现数据压缩。 此后,新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。 然后,可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数,从该模型参数构建适合于该说话者的适应模型。 然后可以通过MAP,MLLR,MLED等进一步适配适配模型。 本征语音技术可以应用于MLLR变换矩阵等; 在本体空间中执行的贝叶斯估计使用关于扬声器空间密度的先前知识来改进关于本征空间中新的说话者位置的估计。

    Voice personalization of speech synthesizer
    7.
    发明授权
    Voice personalization of speech synthesizer 有权
    语音合成器的语音个性化

    公开(公告)号:US06970820B2

    公开(公告)日:2005-11-29

    申请号:US09792928

    申请日:2001-02-26

    CPC分类号: G10L13/04 G10L2021/0135

    摘要: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.

    摘要翻译: 语音合成器被个性化以发音或模仿单个扬声器的语音特征。 单个扬声器提供一定数量的登记数据,其可以从短语言中提取,并且系统将基本合成参数修改为更接近于新说话者的参考数据。 更具体地,合成参数可以被分解为与扬声器相关的参数,诸如与上下文无关的参数,以及与扬声器无关的参数,诸如与上下文相关的参数。 使用来自新扬声器的注册数据来调整与扬声器相关的参数。 在适应之后,将扬声器依赖参数与扬声器独立参数组合以提供一组个性化合成参数。 为了使参数具有少量的注册数据,构造本征空间并用于约束新的说话者的位置,以便可以估计不能由新发言者提供的上下文独立参数。

    Adaptation system and method for E-commerce and V-commerce applications
    8.
    发明授权
    Adaptation system and method for E-commerce and V-commerce applications 有权
    电子商务和电子商务应用的适应系统和方法

    公开(公告)号:US06341264B1

    公开(公告)日:2002-01-22

    申请号:US09258113

    申请日:1999-02-25

    IPC分类号: G10L1528

    摘要: Electronic commerce (E-commerce) and Voice commerce (V-commerce) proceeds by having the user speak into the system. The user's speech is converted by speech recognizer into a form required by the transaction processor that effects the electronic commerce operation. A dimensionality reduction processor converts the user's input speech into a reduced dimensionality set of values termed eigenvoice parameters. These parameters are compared with a set of previously stored eigenvoice parameters representing a speaker population (the eigenspace representing speaker space) and the comparison is used by the speech model adaptation system to rapidly adapt the speech recognizer to the user's speech characteristics. The user's eigenvoice parameters are also stored for subsequent use by the speaker verification and speaker identification modules.

    摘要翻译: 电子商务(电子商务)和语音商务(V-commerce)通过让用户进入系统进行。 用户的语音由语音识别器转换成影响电子商务操作的交易处理器所需的形式。 维数降低处理器将用户的输入语音转换成称为本征语音参数的减小的维度值集合。 将这些参数与表示扬声器群体(表示扬声器空间的本征空间)的一组先前存储的本征语音参数进行比较,并且语音模型适配系统使用该比较来快速地将语音识别器适应于用户的语音特征。 用户的本征语音参数也被存储供讲话人验证和说话者识别模块随后使用。

    Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue
    9.
    发明授权
    Method for goal-oriented speech translation in hand-held devices using meaning extraction and dialogue 有权
    使用意义提取和对话的手持设备中面向目标的语音翻译方法

    公开(公告)号:US06233561B1

    公开(公告)日:2001-05-15

    申请号:US09290628

    申请日:1999-04-12

    IPC分类号: G10L1522

    CPC分类号: G10L15/1822 G10L15/1815

    摘要: A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.

    摘要翻译: 提供了一种用于处理来自用户的口头请求的计算机实现的方法和装置。 语音识别器将口头请求转换为数字格式。 帧数据结构将数字化语音请求的语义分量与预定时隙相关联。 这些时隙指示用于实现预定目标的数据。 连接到语音识别器和帧数据结构的语音理解模块确定语音请求的语义分量。 基于确定的语义分量来填充时隙。 连接到语音理解模块的对话管理器可以基于所确定的语义组件来确定未填充的至少一个时隙,并且在优选实施例中可以提供填充时隙的确认。 制定计算机生成请求以便用户提供与未填充槽相关的数据。 该方法和装置非常适合(但不限于)在手持语音翻译装置中使用。

    Method for generating spelling-to-pronunciation decision tree
    10.
    发明授权
    Method for generating spelling-to-pronunciation decision tree 失效
    拼写到发音决策树的方法

    公开(公告)号:US06230131B1

    公开(公告)日:2001-05-08

    申请号:US09069308

    申请日:1998-04-29

    IPC分类号: G10L1308

    CPC分类号: G10L13/08

    摘要: Decision trees are used to store a series of yes-no questions that can be used to convert spelled-word letter sequences into pronunciations. Letter-only trees, having internal nodes populated with questions about letters in the input sequence, generate one or more pronunciations based on probability data stored in the leaf nodes of the tree. The pronunciations may then be improved by processing them using mixed trees which are populated with questions about letters in the sequence and also questions about phonemes associated with those letters. The mixed tree screens out pronunciations that would not occur in natural speech, thereby greatly improving the results of the letter-to-pronunciation transformation.

    摘要翻译: 决策树用于存储可用于将拼写字母序列转换为发音的一系列“是”的问题。 仅有信息树,内部节点填充有关输入序列中的字母的问题,根据存储在树的叶节点中的概率数据生成一个或多个发音。 然后可以通过使用填充有序列中的字母的问题的混合树以及与这些字母相关的音素的问题来处理它们来发音。 混合树屏蔽了自然语言中不会发生的发音,从而大大提高了字母到发音转换的结果。