Records disambiguation in a multimodal application operating on a multimodal device
    1.
    发明授权
    Records disambiguation in a multimodal application operating on a multimodal device 有权
    记录在多模式设备上运行的多模式应用程序中的歧义

    公开(公告)号:US09349367B2

    公开(公告)日:2016-05-24

    申请号:US12109167

    申请日:2008-04-24

    摘要: Methods, apparatus, and products are disclosed for record disambiguation in a multimodal application operating on a multimodal device, the multimodal device supporting multiple modes of interaction including at least a voice mode and a visual mode, that include: prompting, by the multimodal application, a user to identify a particular record among a plurality of records; receiving, by the multimodal application in response to the prompt, a voice utterance from the user; determining, by the multimodal application, that the voice utterance ambiguously identifies more than one of the plurality of records; generating, by the multimodal application, a user interaction to disambiguate the records ambiguously identified by the voice utterance in dependence upon record attributes of the records ambiguously identified by the voice utterance; and selecting, by the multimodal application for further processing, one of the records ambiguously identified by the voice utterance in dependence upon the user interaction.

    摘要翻译: 公开了用于在多模式设备上操作的多模式应用中的记录消歧的方法,装置和产品,所述多模式设备支持包括至少语音模式和视觉模式的多种交互模式,其包括:由多模式应用提示, 用户识别多个记录中的特定记录; 由多模式应用程序响应于该提示,接收来自用户的语音发声; 由所述多模式应用程序确定所述语音发音含糊地识别所述多​​个记录中的多于一个的记录; 由多模式应用程序产生用户交互,以消除由声音话语模糊识别的记录,依赖于由语音话语模糊识别的记录的记录属性; 以及通过多模式应用程序进行进一步处理,根据用户交互,通过语音话语模糊识别的记录之一。

    Configuring a speech engine for a multimodal application based on location
    2.
    发明授权
    Configuring a speech engine for a multimodal application based on location 有权
    基于位置为多模态应用配置语音引擎

    公开(公告)号:US08938392B2

    公开(公告)日:2015-01-20

    申请号:US11679297

    申请日:2007-02-27

    IPC分类号: G10L21/00 G10L25/00 G10L15/24

    CPC分类号: G10L15/24

    摘要: Methods, apparatus, and products are disclosed for configuring a speech engine for a multimodal application based on location. The multimodal application operates on a multimodal device supporting multiple modes of user interaction with the multimodal application. The multimodal application is operatively coupled to a speech engine. Configuring a speech engine for a multimodal application based on location includes: receiving a location change notification in a location change monitor from a device location manager, the location change notification specifying a current location of the multimodal device; identifying, by the location change monitor, location-based configuration parameters for the speech engine in dependence upon the current location of the multimodal device, the location-based configuration parameters specifying a configuration for the speech engine at the current location; and updating, by the location change monitor, a current configuration for the speech engine according to the identified location-based configuration parameters.

    摘要翻译: 公开了基于位置配置用于多模式应用的语音引擎的方法,装置和产品。 多模式应用程序在支持多模式用户与多模态应用程序交互的多模式设备上运行。 多模式应用可操作地耦合到语音引擎。 基于位置为多模式应用配置语音引擎包括:从设备位置管理器在位置变化监视器中接收位置变化通知,所述位置变化通知指定多模态设备的当前位置; 根据所述多模式设备的当前位置,由所述位置变化监视器识别所述语音引擎的基于位置的配置参数,所述基于位置的配置参数指定所述语音引擎在当前位置的配置; 以及根据所识别的基于位置的配置参数,由所述位置变化监视器更新所述语音引擎的当前配置。

    Pausing a VoiceXML dialog of a multimodal application
    3.
    发明授权
    Pausing a VoiceXML dialog of a multimodal application 有权
    暂停多模式应用程序的VoiceXML对话框

    公开(公告)号:US08713542B2

    公开(公告)日:2014-04-29

    申请号:US11679236

    申请日:2007-02-27

    摘要: Pausing a VoiceXML dialog of a multimodal application, including generating by the multimodal application a pause event; responsive to the pause event, temporarily pausing the dialogue by the VoiceXML interpreter; generating by the multimodal application a resume event; and responsive to the resume event, resuming the dialog. Embodiments are implemented with the multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the VoiceXML interpreter is interpreting the VoiceXML dialog to be paused.

    摘要翻译: 暂停多模式应用程序的VoiceXML对话框,包括由多模态应用程序生成暂停事件; 响应暂停事件,VoiceXML解释器临时暂停对话; 由多模式应用程序生成一个简历事件; 并响应resume事件,恢复对话。 实施例是通过在多模式设备上操作的多模式应用来实现的,该多模式设备支持包括语音模式和一种或多种非语音模式的多种交互模式,多模式应用可操作地耦合到VoiceXML解释器,并且VoiceXML解释器正在解释VoiceXML对话 暂停

    Systems and methods for inputting graphical data into a graphical input field
    5.
    发明授权
    Systems and methods for inputting graphical data into a graphical input field 失效
    将图形数据输入图形输入字段的系统和方法

    公开(公告)号:US08296149B2

    公开(公告)日:2012-10-23

    申请号:US12363580

    申请日:2009-01-30

    IPC分类号: G10L15/22

    CPC分类号: G10L2015/228

    摘要: A system (20) for inputting graphical data into a graphical input field includes a graphical input device (22) for inputting the graphical data into the graphical input field, and a processor-executable voice-form module (28) responsive to an initial presentation of graphical data to the graphical input device. The voice-form module (28) causes a determination of whether the inputting of the graphical data into the graphical input field is complete. A method for inputting graphical data into a graphical input field includes initiating an input of graphical data via a graphical input device into the graphical input field, and actuating a voice-form module in response to initiating the input of graphical data into the graphical input field.

    摘要翻译: 用于将图形数据输入到图形输入字段的系统(20)包括用于将图形数据输入图形输入字段的图形输入装置(22)和响应于初始呈现的处理器可执行语音模块(28) 的图形数据输入到图形输入设备。 声音形式模块(28)确定图形输入字段中的图形数据的输入是否完成。 用于将图形数据输入到图形输入字段的方法包括:通过图形输入装置将图形数据的输入启动到图形输入字段中,以及响应于启动图形数据输入到图形输入字段来启动语音模块模块 。

    DYNAMICALLY GENERATING A VOCAL HELP PROMPT IN A MULTIMODAL APPLICATION
    7.
    发明申请
    DYNAMICALLY GENERATING A VOCAL HELP PROMPT IN A MULTIMODAL APPLICATION 审中-公开
    动态地在多模式应用程序中生成VOCAL帮助提示

    公开(公告)号:US20120065982A1

    公开(公告)日:2012-03-15

    申请号:US13303380

    申请日:2011-11-23

    IPC分类号: G10L21/00

    摘要: Dynamically generating a vocal help prompt in a multimodal application that include detecting a help-triggering event for an input element of a VoiceXML dialog, where the detecting is implemented with a multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the multimodal application has no static help text. Dynamically generating a vocal help prompt in a multimodal application according to embodiments of the present invention typically also includes retrieving, by the VoiceXML interpreter from a source of help text, help text for an element of a speech recognition grammar, forming by the VoiceXML interpreter the help text into a vocal help prompt, and presenting by the multimodal application the vocal help prompt through a computer user interface to a user.

    摘要翻译: 在多模式应用中动态地产生声乐帮助提示,包括检测VoiceXML对话框的输入元素的帮助触发事件,其中使用在支持多种交互模式的多模式设备上操作的多模式应用来实现检测,包括语音模式 和一个或多个非语音模式,多模式应用程序可操作地耦合到VoiceXML解释器,并且多模式应用程序没有静态帮助文本。 在根据本发明的实施例的多模式应用中动态地产生声乐帮助提示通常还包括由VoiceXML解释器从帮助文本的源中检索帮助语音识别语法的元素的文本,由VoiceXML解释器形成 帮助文本进入声乐帮助提示,并通过多用途应用程序向用户提供通过计算机用户界面的声乐帮助提示。

    TESTING A GRAMMAR USED IN SPEECH RECOGNITION FOR RELIABILITY IN A PLURALITY OF OPERATING ENVIRONMENTS HAVING DIFFERENT BACKGROUND NOISE
    8.
    发明申请
    TESTING A GRAMMAR USED IN SPEECH RECOGNITION FOR RELIABILITY IN A PLURALITY OF OPERATING ENVIRONMENTS HAVING DIFFERENT BACKGROUND NOISE 有权
    测试在具有不同背景噪声的多种操作环境中可靠性的语音识别中使用的灰度

    公开(公告)号:US20120053934A1

    公开(公告)日:2012-03-01

    申请号:US13289233

    申请日:2011-11-04

    IPC分类号: G10L15/20

    CPC分类号: G10L15/01

    摘要: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

    摘要翻译: 用于在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性的方法,系统和产品,包括:为所述多个操作环境中的每一个接收记录的背景噪声; 产生语音识别引擎使用语法进行识别的测试语音语音; 将测试语音发音与每个记录的背景噪声混合,导致多个混合测试语音话语,每个混合测试语音话语具有不同的背景噪声; 对于每个混合测试语音话语,使用语法和混合测试语音话语进行语音识别,导致每个混合测试语音话语的语音识别结果; 并且对于每个记录的背景噪声,根据具有记录的背景噪声的混合测试语音话语的语音识别结果,评估语法的语音识别可靠性。

    SYNCHRONIZING VISUAL AND SPEECH EVENTS IN A MULTIMODAL APPLICATION
    9.
    发明申请
    SYNCHRONIZING VISUAL AND SPEECH EVENTS IN A MULTIMODAL APPLICATION 有权
    在多模式应用程序中同步视觉和语音活动

    公开(公告)号:US20120022875A1

    公开(公告)日:2012-01-26

    申请号:US13249717

    申请日:2011-09-30

    IPC分类号: G10L21/00

    CPC分类号: G10L15/1815 G10L2021/105

    摘要: Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.

    摘要翻译: 公开了用于在多模式应用中同步视觉和语音事件的示例性方法,系统和产品,包括从用户语音接收; 确定语音的语义解释; 调用全局应用程序更新处理程序; 由全局应用程序更新处理程序识别依赖于语义解释的附加处理功能; 并执行附加功能。 典型实施例可以包括在执行附加功能之后更新视觉元素。 典型实施例可以包括在执行附加功能之后更新语音表单。 典型实施例还可以包括在更新语音形式之后更新状态表。 典型实施例还可以包括在执行附加功能之后重新启动语音形式。

    Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise
    10.
    发明授权
    Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise 有权
    在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性

    公开(公告)号:US08082148B2

    公开(公告)日:2011-12-20

    申请号:US12109204

    申请日:2008-04-24

    IPC分类号: G10L15/20

    CPC分类号: G10L15/01

    摘要: Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

    摘要翻译: 用于在具有不同背景噪声的多个操作环境中测试用于语音识别中的语法的可靠性的方法,系统和产品,包括:为所述多个操作环境中的每一个接收记录的背景噪声; 产生语音识别引擎使用语法进行识别的测试语音语音; 将测试语音发音与每个记录的背景噪声混合,导致多个混合测试语音话语,每个混合测试语音话语具有不同的背景噪声; 对于每个混合测试语音话语,使用语法和混合测试语音话语进行语音识别,导致每个混合测试语音话语的语音识别结果; 并且对于每个记录的背景噪声,根据具有记录的背景噪声的混合测试语音话语的语音识别结果来评估语法的语音识别可靠性。