MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA
    2.
    发明申请
    MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA 有权
    用于自动语音识别的模型培训从不正确的转录数据

    公开(公告)号:US20100318355A1

    公开(公告)日:2010-12-16

    申请号:US12482142

    申请日:2009-06-10

    CPC classification number: G10L15/063 G10L15/065

    Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

    Abstract translation: 描述了用于训练声学模型的技术和系统。 在一个实施例中,用于训练声学模型的技术包括将包括转录错误的训练数据的语料库划分成N个部分,并且在每个部分上,用增量声学模型和增量语言模型解码语音以产生解码的转录。 该技术可以进一步包括将一对单词之间的沉默插入解码的转录中,并根据每个部分的时间将与发音对应的原始转录与解码的转录对准。 该技术可以进一步包括从具有至少Q个连续匹配对齐字的话语中选择一段,以及使用所选择的段来训练增量声学模型。 然后可以在训练数据的后续部分上使用经过训练的增量声学模型。 描述和要求保护其他实施例。

    Discriminative training for speaker and speech verification
    3.
    发明授权
    Discriminative training for speaker and speech verification 有权
    演讲者和言语验证的歧视性培训

    公开(公告)号:US07454339B2

    公开(公告)日:2008-11-18

    申请号:US11312981

    申请日:2005-12-20

    CPC classification number: G10L17/04 G10L15/063

    Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems. The method includes: defining a likelihood ratio for a given speech segment, whose speaker identity (for SV system) or linguist identity (for UV system) is known, using a corresponding acoustic model, and an alternative acoustic model which represents all other speakers (in SV) or all other linguist identities (in UV); determining an average likelihood ratio score for the likelihood ratio scores over a set of training utterances (referred to as true data set) whose speaker identities (for SV) or linguist identities (for UV) are the same; determining an average likelihood ratio score for the likelihood ratio scores over a competing set of training utterances which excludes the speech data in the true data set (referred to as competing data set); and optimizing a difference between the average likelihood ratio score over the true data set and the average likelihood ratio score over the competing data set, thereby improving the acoustic model.

    Abstract translation: 提供用于区分性训练声学模型的方法用于自动说话人验证(SV)和语音(或话语)验证(UV)系统。 该方法包括:使用相应的声学模型和代表所有其他扬声器的替代声学模型来定义给定语音段的似然比,其中,所述语音段的语音识别(对于SV系统)或语言学家身份(对于UV系统)是已知的) 在SV)或所有其他语言学家身份(在紫外线); 确定其扬声器身份(对于SV)或语言学家身份(对于UV))相同的一组训练话语(称为真实数据集)的似然比分数的平均似然比分数; 确定在排除真实数据集(称为竞争数据集)中的语音数据的竞争性训练话语组之间的似然比分数的平均似然比分数; 并优化真实数据集之间的平均似然比分数与竞争数据集之间的平均似然比分数之间的差异,从而改善声学模型。

    TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION
    4.
    发明申请
    TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION 有权
    增强自动语音识别技术

    公开(公告)号:US20100228548A1

    公开(公告)日:2010-09-09

    申请号:US12400528

    申请日:2009-03-09

    CPC classification number: G10L15/065

    Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

    Abstract translation: 描述了用于增强自动语音识别的技术。 增强的ASR系统可以用于产生纠错功能。 误差校正功能可以表示监督的参数集合与使用相同的一组声学训练数据生成的参数的无监督的训练集之间的映射,并且将误差校正功能应用于无监督的参数集合,以形成校正的一组 用于执行扬声器适配的参数。 描述和要求保护其他实施例。

    Model training for automatic speech recognition from imperfect transcription data
    5.
    发明授权
    Model training for automatic speech recognition from imperfect transcription data 有权
    从不完美的转录数据自动语音识别的模型训练

    公开(公告)号:US09280969B2

    公开(公告)日:2016-03-08

    申请号:US12482142

    申请日:2009-06-10

    CPC classification number: G10L15/063 G10L15/065

    Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

    Abstract translation: 描述了用于训练声学模型的技术和系统。 在一个实施例中,用于训练声学模型的技术包括将包括转录错误的训练数据的语料库划分成N个部分,并且在每个部分上,用增量声学模型和增量语言模型解码语音以产生解码的转录。 该技术可以进一步包括将一对单词之间的沉默插入解码的转录中,并根据每个部分的时间将与发音对应的原始转录与解码的转录对准。 该技术可以进一步包括从具有至少Q个连续匹配对齐字的话语中选择一段,以及使用所选择的段来训练增量声学模型。 然后可以在训练数据的后续部分上使用经过训练的增量声学模型。 描述和要求保护其他实施例。

    Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data
    6.
    发明授权
    Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data 有权
    使用在相同声学训练数据上训练的无监督和受监督的语音模型参数之间的映射来增强自动语音识别

    公开(公告)号:US08306819B2

    公开(公告)日:2012-11-06

    申请号:US12400528

    申请日:2009-03-09

    CPC classification number: G10L15/065

    Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

    Abstract translation: 描述了用于增强自动语音识别的技术。 增强的ASR系统可以用于产生纠错功能。 误差校正功能可以表示监督的参数集合与使用相同的一组声学训练数据生成的参数的无监督的训练集之间的映射,并且将误差校正功能应用于无监督的参数集合,以形成校正的一组 用于执行扬声器适配的参数。 描述和要求保护其他实施例。

    USER INTERACTION FOR CONTENT BASED STORAGE AND RETRIEVAL
    7.
    发明申请
    USER INTERACTION FOR CONTENT BASED STORAGE AND RETRIEVAL 审中-公开
    用于基于内容的存储和检索的交互

    公开(公告)号:US20090064008A1

    公开(公告)日:2009-03-05

    申请号:US11848781

    申请日:2007-08-31

    CPC classification number: G06F3/04883 G06F16/54

    Abstract: A graphic user interface system for use with a content based retrieval system includes an active display having display areas. For example, the display areas include a main area providing an overview of database contents by displaying representative samples of the database contents. The display areas also include one or more query areas into which one or more of the representative samples can be moved from the main area by a user employing gesture based interaction. A query formulation module employs the one or more representative samples moved into the query area to provide feedback to the content based retrieval system.

    Abstract translation: 用于与基于内容的检索系统一起使用的图形用户界面系统包括具有显示区域的活动显示器。 例如,显示区域包括通过显示数据库内容的代表性样本来提供数据库内容的概述的主区域。 显示区域还包括一个或多个查询区域,一个或多个代表性样本可以由使用基于手势的交互的用户从主区域移动到该区域中。 查询制定模块使用移动到查询区域中的一个或多个代表性样本来向基于内容的检索系统提供反馈。

    Discriminative training for speaker and speech verification
    8.
    发明申请
    Discriminative training for speaker and speech verification 有权
    演讲者和言语验证的歧视性培训

    公开(公告)号:US20070143109A1

    公开(公告)日:2007-06-21

    申请号:US11312981

    申请日:2005-12-20

    CPC classification number: G10L17/04 G10L15/063

    Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems. The method includes: defining a likelihood ratio for a given speech segment, whose speaker identity (for SV system) or linguist identity (for UV system) is known, using a corresponding acoustic model, and an alternative acoustic model which represents all other speakers (in SV) or all other linguist identities (in UV); determining an average likelihood ratio score for the likelihood ratio scores over a set of training utterances (referred to as true data set) whose speaker identities (for SV) or linguist identities (for UV) are the same; determining an average likelihood ratio score for the likelihood ratio scores over a competing set of training utterances which excludes the speech data in the true data set (referred to as competing data set); and optimizing a difference between the average likelihood ratio score over the true data set and the average likelihood ratio score over the competing data set, thereby improving the acoustic model.

    Abstract translation: 提供用于区分性训练声学模型的方法用于自动说话人验证(SV)和语音(或话语)验证(UV)系统。 该方法包括:使用对应的声学模型和代表所有其他扬声器的替代声学模型来定义给定语音段的似然比,其中,所述语音段的扬声器身份(对于SV系统)或语言学家身份(对于UV系统)是已知的) 在SV)或所有其他语言学家身份(在紫外线); 确定其扬声器身份(对于SV)或语言学家身份(对于UV))相同的一组训练话语(称为真实数据集)的似然比分数的平均似然比分数; 确定在排除真实数据集(称为竞争数据集)中的语音数据的竞争性训练话语组之间的似然比分数的平均似然比分数; 并优化真实数据集之间的平均似然比分数与竞争数据集之间的平均似然比分数之间的差异,从而改善声学模型。

    Discriminative training of HMM models using maximum margin estimation for speech recognition
    9.
    发明申请
    Discriminative training of HMM models using maximum margin estimation for speech recognition 审中-公开
    用于语音识别的最大边际估计的HMM模型的辨别性训练

    公开(公告)号:US20070083373A1

    公开(公告)日:2007-04-12

    申请号:US11247854

    申请日:2005-10-11

    CPC classification number: G10L15/144

    Abstract: An improved discriminative training method is provided for hidden Markov models. The method includes: defining a measure of separation margin for the data; identifying a subset of training utterances having utterances misrecognized by the models; defining a training criterion for the models based on maximizing the separation margin; formulating the training criterion as a constrained minimax optimization problem; and solving the constrained minimax optimization problem over the subset of training utterances, thereby discriminatively training the models.

    Abstract translation: 为隐马尔可夫模型提供了一种改进的辨别训练方法。 该方法包括:定义数据的分离余量的度量; 识别具有由模型误认的话语的训练话语的子集; 基于最大化分离边界来定义模型的训练标准; 制定训练标准作为约束最小化优化问题; 并且在训练语言的子集上求解约束最小最优化问题,从而区分性地训练模型。

Patent Agency Ranking