Method for building a natural language understanding model for a spoken dialog system
    1.
    发明授权
    Method for building a natural language understanding model for a spoken dialog system 有权
    建立语言对话系统的自然语言理解模型的方法

    公开(公告)号:US07620550B1

    公开(公告)日:2009-11-17

    申请号:US11866685

    申请日:2007-10-03

    IPC分类号: G10L15/18

    摘要: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.

    摘要翻译: 公开了一种生成在口头对话系统中使用的自然语言模型的方法。 该方法包括对标签指南中定义的每个呼叫类型使用样本话语和创建许多手工制作规则。 使用手工制作的规则和样品说话来生成和测试第一个NLU模型。 使用示例语句作为新的训练数据并使用手工制作规则构建了第二个NLU模型。 使用第一批标签数据对第二个NLU模型进行性能测试。 通过将前一批标签数据添加到训练数据并使用新批签名数据作为测试数据来生成一系列NLU模型,训练数据不断增加,构建了一系列NLU模型。 如果不是全部接收到标签数据,则该方法包括重复建立一系列NLU模型的步骤,直到接收到所有标记数据为止。 在接收到所有训练数据之后,至少一次,该方法包括使用所有标签数据构建第三NLU模型,其中第三NLU模型用于生成口语对话服务。

    METHOD FOR BUILDING A NATURAL LANGUAGE UNDERSTANDING MODEL FOR A SPOKEN DIALOG SYSTEM
    2.
    发明申请
    METHOD FOR BUILDING A NATURAL LANGUAGE UNDERSTANDING MODEL FOR A SPOKEN DIALOG SYSTEM 有权
    用于建立自然语言的方法来理解对讲机系统的模型

    公开(公告)号:US20100042404A1

    公开(公告)日:2010-02-18

    申请号:US12582062

    申请日:2009-10-20

    IPC分类号: G10L15/18 G06F17/27

    摘要: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.

    摘要翻译: 公开了一种生成在口头对话系统中使用的自然语言模型的方法。 该方法包括对标签指南中定义的每个呼叫类型使用样本话语和创建许多手工制作规则。 使用手工制作的规则和样品说话来生成和测试第一个NLU模型。 使用示例语句作为新的训练数据并使用手工制作规则构建了第二个NLU模型。 使用第一批标签数据对第二个NLU模型进行性能测试。 通过将前一批标签数据添加到训练数据并使用新批签名数据作为测试数据来生成一系列NLU模型,训练数据不断增加,构建了一系列NLU模型。 如果不是全部接收到标签数据,则该方法包括重复建立一系列NLU模型的步骤,直到接收到所有标记数据为止。 在接收到所有训练数据之后,至少一次,该方法包括使用所有标签数据构建第三NLU模型,其中第三NLU模型用于生成口语对话服务。

    Method for building a natural language understanding model for a spoken dialog system
    3.
    发明授权
    Method for building a natural language understanding model for a spoken dialog system 有权
    建立语言对话系统的自然语言理解模型的方法

    公开(公告)号:US07933766B2

    公开(公告)日:2011-04-26

    申请号:US12582062

    申请日:2009-10-20

    IPC分类号: G06F17/27

    摘要: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.

    摘要翻译: 公开了一种生成在口头对话系统中使用的自然语言模型的方法。 该方法包括对标签指南中定义的每个呼叫类型使用样本话语和创建许多手工制作规则。 使用手工制作的规则和样品说话来生成和测试第一个NLU模型。 使用示例语句作为新的训练数据并使用手工制作规则构建了第二个NLU模型。 使用第一批标签数据对第二个NLU模型进行性能测试。 通过将前一批标签数据添加到训练数据并使用新批签名数据作为测试数据来生成一系列NLU模型,训练数据不断增加,构建了一系列NLU模型。 如果不是全部接收到标签数据,则该方法包括重复建立一系列NLU模型的步骤,直到接收到所有标记数据为止。 在接收到所有训练数据之后,至少一次,该方法包括使用所有标签数据构建第三NLU模型,其中第三NLU模型用于生成口语对话服务。

    Method for building a natural language understanding model for a spoken dialog system
    4.
    发明授权
    Method for building a natural language understanding model for a spoken dialog system 有权
    建立语言对话系统的自然语言理解模型的方法

    公开(公告)号:US07295981B1

    公开(公告)日:2007-11-13

    申请号:US10755014

    申请日:2004-01-09

    IPC分类号: G10L15/18

    摘要: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.

    摘要翻译: 公开了一种生成在口头对话系统中使用的自然语言模型的方法。 该方法包括对标签指南中定义的每个呼叫类型使用样本话语和创建许多手工制作规则。 使用手工制作的规则和样品说话来生成和测试第一个NLU模型。 使用示例语句作为新的训练数据并使用手工制作规则构建了第二个NLU模型。 使用第一批标签数据对第二个NLU模型进行性能测试。 通过将前一批标签数据添加到训练数据并使用新批签名数据作为测试数据来生成一系列NLU模型,训练数据不断增加,构建了一系列NLU模型。 如果不是全部接收到标签数据,则该方法包括重复建立一系列NLU模型的步骤,直到接收到所有标记数据为止。 在接收到所有训练数据之后,至少一次,该方法包括使用所有标签数据构建第三NLU模型,其中第三NLU模型用于生成口语对话服务。

    System and method of providing an automated data-collection in spoken dialog systems
    5.
    发明授权
    System and method of providing an automated data-collection in spoken dialog systems 有权
    在口头对话系统中提供自动数据收集的系统和方法

    公开(公告)号:US08185399B2

    公开(公告)日:2012-05-22

    申请号:US11029798

    申请日:2005-01-05

    IPC分类号: G10L21/00 G10L19/00 G06F17/27

    摘要: The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.

    摘要翻译: 本发明涉及一种用于收集在口头对话系统中使用的数据的系统和方法。 本发明的一个方面通常被称为在与对话系统中的用户的对话开始时自动执行数据收集的自动隐藏人。 该方法包括向用户呈现初始提示,使用自动语音识别引擎识别接收到的用户话语,并使用口语理解模块对所识别的用户话语进行分类。 如果识别的用户话语不能被理解或可被分类到预定的接受阈值,则该方法重新提示用户。 如果识别的用户话语不能被分类为预定的拒绝阈值,则该方法将用户转移给人,因为这可能意味着任务特定的话语。 然后,接收和分类的用户话语用于训练口语对话系统。

    Active labeling for spoken language understanding
    6.
    发明授权
    Active labeling for spoken language understanding 有权
    积极标注口语理解

    公开(公告)号:US07949525B2

    公开(公告)日:2011-05-24

    申请号:US12485103

    申请日:2009-06-16

    IPC分类号: G10L15/00 G10L15/06 G10L15/20

    CPC分类号: G10L15/1822

    摘要: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.

    摘要翻译: 提供口语理解方法和系统。 该方法包括基于先前训练的分类器对一组标记的候选话语进行分类,为每个候选语音生成分类类型,从训练分类器接收分类类型的置信度分数, 每个候选话语与候选话语的相应标签相比较,并且根据分析重新检查候选话语。 该系统包括被配置为控制系统中的处理器以执行该方法的步骤的模块。

    Unsupervised and active learning in automatic speech recognition for call classification
    7.
    发明授权
    Unsupervised and active learning in automatic speech recognition for call classification 有权
    无监督和主动学习自动语音识别呼叫分类

    公开(公告)号:US08818808B2

    公开(公告)日:2014-08-26

    申请号:US11063910

    申请日:2005-02-23

    IPC分类号: G10L15/06

    摘要: Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.

    摘要翻译: 提供了至少包含少量手动转录数据的语音数据。 对没有相应的手动转录的话语数据中的一个进行自动语音识别以产生自动转录的话语。 使用所有手动转录数据和自动转录的话语训练模型。 智能地选择并且手动地转录预定数量的不具有对应的手动转录的话语。 自动转录的数据以及具有相应手动转录的数据的标签。 在本发明的另一方面,音频数据从至少一个源开始,并且语言模型被训练用于从所开采的音频数据进行呼叫分类以产生语言模型。

    Active learning process for spoken dialog systems
    8.
    发明授权
    Active learning process for spoken dialog systems 有权
    口语对话系统的主动学习过程

    公开(公告)号:US07292976B1

    公开(公告)日:2007-11-06

    申请号:US10447888

    申请日:2003-05-29

    IPC分类号: G06F17/27 G10L15/00

    摘要: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.

    摘要翻译: 需要大量的人力劳动来转录和注释创建和更新自动语音识别(ASR)和语言理解(SLU)模型所需的训练语料库。 主动学习可以减少训练ASR和SLU模型所需的转录和注释数据量。 在本发明的一个方面,耦合主动学习ASR过程和主动学习SLU过程,从而相对于维持ASR和SLU域中的数据隔离的过程而获得进一步的效率。