Supervised adaptation using corrective N-best decoding
    1.
    发明授权
    Supervised adaptation using corrective N-best decoding 失效
    使用校正N最佳解码的监督适应

    公开(公告)号:US06272462B1

    公开(公告)日:2001-08-07

    申请号:US09257893

    申请日:1999-02-25

    IPC分类号: G10L1506

    CPC分类号: G10L15/075 G10L2015/0635

    摘要: Supervised adaptation speech is supplied to the recognizer and the recognizer generates the N-best transcriptions of the adaptation speech. These transcriptions include the one transcription known to be correct, based on a priori knowledge of the adaptation speech, and the remaining transcriptions known to be incorrect. The system applies weights to each transcription: a positive weight to the correct transcription and negative weights to the incorrect transcriptions. These weights have the effect of moving the incorrect transcriptions away from the correct one, rendering the recognition system more discriminative for the new speaker's speaking characteristics. Weights applied to the incorrect solutions are based on the respective likelihood scores generated by the recognizer. The sum of all weights (positive and negative) are a positive number. This ensures that the system will converge.

    摘要翻译: 受监督的适应语音被提供给识别器,并且识别器生成适应语音的N个最佳的转录。 这些转录包括基于适应言语的先验知识的已知正确的一个转录,以及已知不正确的剩余转录。 该系统对每个转录应用权重:对正确转录的正负重和不正确转录的负权重。 这些权重具有将错误的记录从正确的转录中移开的效果,使识别系统对于新的说话者的说话特征更具歧视性。 应用于不正确解的权重是基于识别器产生的各自的可能性得分。 所有权重(正和负)的和是正数。 这样可以确保系统收敛。

    Unsupervised speech model adaptation using reliable information among N-best strings
    2.
    发明授权
    Unsupervised speech model adaptation using reliable information among N-best strings 失效
    无人监督的语音模型适应使用N最佳字符串中的可靠信息

    公开(公告)号:US06205426B1

    公开(公告)日:2001-03-20

    申请号:US09237170

    申请日:1999-01-25

    IPC分类号: G10L1514

    CPC分类号: G10L15/065

    摘要: The system performs unsupervised speech model adaptation using the recognizer to generate the N-best solutions for an input utterance. Each of these N-best solutions is tested by a reliable information extraction process. Reliable information is extracted by a weighting technique based on likelihood scores generated by the recognizer, or by a non-linear thresholding function. The system may be used in a single pass implementation or iteratively in a multi-pass implementation.

    摘要翻译: 该系统使用识别器执行无监督的语音模型自适应,以产生用于输入语音的N最佳解。 这些N最佳解决方案中的每一个都通过可靠的信息提取过程进行测试。 通过基于由识别器生成的似然分数的加权技术或非线性阈值函数来提取可靠信息。 该系统可以在单遍实现中或在多遍实现中迭代地使用。

    Focused language models for improved speech input of structured documents
    3.
    发明授权
    Focused language models for improved speech input of structured documents 有权
    用于改进结构化文档语音输入的专注语言模型

    公开(公告)号:US06901364B2

    公开(公告)日:2005-05-31

    申请号:US09951093

    申请日:2001-09-13

    CPC分类号: G10L15/1815 G10L15/30

    摘要: An e-mail message process is provided for use with a personal digital assistant which allows for the use of input speech messaging which is converted to text using a focused language model which is downloaded by a cellular phone connection to an Internet server which provides the focused language model based upon a topic for the intended e-mail message. The text that is generated from the input speech method can be summarized by the e-mail message processor and can be edited by the user. The generated e-mail message can then be transmitted again via cellular connection to an Internet e-mail server for transmitting the e-mail message to a recipient.

    摘要翻译: 提供电子邮件消息处理以与个人数字助理一起使用,该个人数字助理允许使用输入语音消息传送,其使用由通过蜂窝电话连接下载的聚焦语言模型转换为文本,该互联网服务器提供聚焦 基于预期电子邮件的主题的语言模型。 从输入语音方法生成的文本可以由电子邮件消息处理器来总结,并且可以由用户编辑。 然后可以通过蜂窝连接再次将生成的电子邮件消息发送到Internet电子邮件服务器,以将电子邮件消息发送给接收者。

    Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification
    4.
    发明授权
    Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification 有权
    用于语音识别,扬声器识别和说话人验证的声学模型的本征语重新估计技术

    公开(公告)号:US06895376B2

    公开(公告)日:2005-05-17

    申请号:US09849174

    申请日:2001-05-04

    IPC分类号: G10L15/06 G10L17/00

    CPC分类号: G10L15/07 G10L17/02

    摘要: A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.

    摘要翻译: 在训练期间使用减小的维度本征语音分析技术来开发用于异音素的上下文相关的声学模型。 执行重新估计过程以更强烈地分离语音模型的与扬声器相关的和与扬声器无关的组件。 特定语音技术在运行时也用于新演讲者的演讲。 该技术可以消除单个扬声器的特性,从而产生更普遍适用和强大的异音模型。 在一个实施例中,本征语音技术用于识别每个说话者的质心,然后可以将其“减去”识别方程。

    Maximum likelihood method for finding an adapted speaker model in eigenvoice space
    5.
    发明授权
    Maximum likelihood method for finding an adapted speaker model in eigenvoice space 失效
    在本征语音空间中找到适应的说话者模型的最大似然法

    公开(公告)号:US06263309B1

    公开(公告)日:2001-07-17

    申请号:US09070054

    申请日:1998-04-30

    IPC分类号: G10L1508

    CPC分类号: G10L15/07

    摘要: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.

    摘要翻译: 一组扬声器依赖模型训练在相对较多数量的训练扬声器上,每个扬声器一个模型和模型参数以预定义的顺序提取,以构建一组超级矢量,每个扬声器一个。 然后在一组超级矢量上执行原理分量分析,以生成一组定义本征语音空间的特征向量。 如果需要,可以减少向量的数量以实现数据压缩。 此后,新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。 然后,可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数,从该模型参数构建适合于该说话者的适应模型。 可以通过在训练数据中包括环境变化来执行环境适应。

    Media production system using time alignment to scripts
    7.
    发明申请
    Media production system using time alignment to scripts 审中-公开
    媒体制作系统使用时间对齐脚本

    公开(公告)号:US20050228663A1

    公开(公告)日:2005-10-13

    申请号:US10814960

    申请日:2004-03-31

    IPC分类号: G10L15/26

    CPC分类号: G10L15/26

    摘要: A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.

    摘要翻译: 媒体制作系统包括文本对准模块,其基于语音识别结果将多个语音记录与脚本的文本行对齐。 导航模块通过与用户对应的多个语音记录的线特定部分通信来响应相应于脚本的文本行的用户导航选择。 编辑模块通过基于脚本中的文本行的关系与组合记录中的至少一种相结合记录来组合记录中的多个语音记录的行特定部分来累积多个语音记录与文本行的响应,以及时间对齐 在多个语音记录和组合记录之间。

    Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing
    8.
    发明申请
    Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
    语音标记,语音注释和可选后置处理的便携式设备的语音识别

    公开(公告)号:US20050075881A1

    公开(公告)日:2005-04-07

    申请号:US10677174

    申请日:2003-10-02

    IPC分类号: G10L15/26 G10L21/00

    CPC分类号: G06F17/30796 G10L15/26

    摘要: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

    摘要翻译: 媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。 分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上,并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。 媒体标签器使用生成的语音识别文本来标记捕获的媒体,并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。 标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。 在后期处理中,注释可以转换为标签,用于使用字母对声音规则和拼写单词输入来编辑词典,或直接与语音匹配以检索所捕获的媒体。

    Speaker verification and speaker identification based on a priori knowledge
    9.
    发明授权
    Speaker verification and speaker identification based on a priori knowledge 有权
    基于先验知识的扬声器验证和扬声器识别

    公开(公告)号:US06697778B1

    公开(公告)日:2004-02-24

    申请号:US09610495

    申请日:2000-07-05

    IPC分类号: G10L1506

    CPC分类号: G10L17/02

    摘要: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

    摘要翻译: 扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。 扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器,或者由训练和客户端扬声器组合构成。 还提供了基于客户端环境信息对扬声器空间的再估计,以提高客户端数据落入扬声器空间的可能性。 在将客户登记到扬声器空间中,当满足预定条件时,可以获得额外的客户端语音。 扬声器分配也可以在客户端注册步骤中使用。

    Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques
    10.
    发明授权
    Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques 有权
    使用本征语音技术的扬声器归一化和扬声器和环境适应的尺寸减小

    公开(公告)号:US06343267B1

    公开(公告)日:2002-01-29

    申请号:US09148753

    申请日:1998-09-04

    IPC分类号: G10L1908

    CPC分类号: G06K9/6247 G10L15/07

    摘要: A set of speaker dependent models or adapted models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Dimensionality reduction is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. The adapted model may then be further adapted via MAP, MLLR, MLED or the like. The eigenvoice technique may be applied to MLLR transformation matrices or the like; Bayesian estimation performed in eigenspace uses prior knowledge about speaker space density to refine the estimate about the location of a new speaker in eigenspace.

    摘要翻译: 一组扬声器依赖模型或适应模型被训练在相对较多数量的训练扬声器上,每个扬声器一个模型和模型参数以预定义的顺序被提取以构造一组超级矢量,每个扬声器一个。 然后对该一组超级矢量执行尺寸减小,以生成一组定义本征语音空间的特征向量。 如果需要,可以减少向量的数量以实现数据压缩。 此后,新的说话者提供了通过基于最大似然估计将该超向量限制在本征语音空间中来构建超向量的适配数据。 然后,可以使用这个新的说话者的本征空间中得到的系数来构建一组新的模型参数,从该模型参数构建适合于该说话者的适应模型。 然后可以通过MAP,MLLR,MLED等进一步适配适配模型。 本征语音技术可以应用于MLLR变换矩阵等; 在本体空间中执行的贝叶斯估计使用关于扬声器空间密度的先前知识来改进关于本征空间中新的说话者位置的估计。