Performing a safety analysis for user-defined voice commands to ensure that the voice commands do not cause speech recognition ambiguities
    1.
    发明授权
    Performing a safety analysis for user-defined voice commands to ensure that the voice commands do not cause speech recognition ambiguities 有权
    对用户定义的语音命令执行安全分析,以确保语音命令不会导致语音识别模糊

    公开(公告)号:US08234120B2

    公开(公告)日:2012-07-31

    申请号:US11460075

    申请日:2006-07-26

    IPC分类号: G10L21/00 G10L15/22 H04H20/47

    CPC分类号: G10L15/075

    摘要: The present invention discloses a solution for assuring user-defined voice commands are unambiguous. The solution can include a step of identifying a user attempt to enter a user-defined voice command into a voice-enabled system. A safety analysis can be performed on the user-defined voice command to determine a likelihood that the user-defined voice command will be confused with preexisting voice commands recognized by the voice-enabled system. When a high likelihood of confusion is determined by the safety analysis, a notification can be presented that the user-defined voice command is subject to confusion. A user can then define a different voice command or can choose to continue to use the potentially confusing command, possibly subject to a system imposed confusion mitigating condition or action.

    摘要翻译: 本发明公开了用于确定用户定义的语音命令的解决方案是明确的。 解决方案可以包括识别用户尝试将用户定义的语音命令输入到启用语音的系统中的步骤。 可以对用户定义的语音命令执行安全性分析,以确定用户定义的语音命令将与由支持语音的系统识别的预先存在的语音命令相混淆的可能性。 当通过安全性分析确定高混淆可能性时,可以呈现用户定义的语音命令受到混淆的通知。 然后,用户可以定义不同的语音命令,或者可以选择继续使用可能混淆的命令,这可能受制于系统的混淆减轻条件或动作。

    Reducing recording time when constructing a concatenative TTS voice using a reduced script and pre-recorded speech assets
    3.
    发明授权
    Reducing recording time when constructing a concatenative TTS voice using a reduced script and pre-recorded speech assets 有权
    使用减少的脚本和预录制的语音资源构建级联TTS语音时减少录制时间

    公开(公告)号:US08019605B2

    公开(公告)日:2011-09-13

    申请号:US11748256

    申请日:2007-05-14

    IPC分类号: G10L13/08 G10L13/06

    CPC分类号: G10L13/04

    摘要: The present invention discloses a system and a method for creating a reduced script, which is read by a voice talent to create a concatenative text-to-speech (TTS) voice. The method can automatically process pre-recorded audio to derive speech assets for a concatenative TTS voice. The pre-recording audio can include sets of recorded phrases used by a speech user interface (Sill). A set of unfulfilled speech assets needed for foil phonetic coverage of the concatenative TTS voice can be determined. A reduced script can be constructed that includes a set of phrases, which when read by a voice talent result in a reduced corpus. When the reduced corpus is automatically processed, a reduced set of speech assets result. The reduced set includes each of the unfulfilled speech assets. When this reduced corpus is combined with existing speech assets the result will be a voice with a complete set of speech assets.

    摘要翻译: 本发明公开了一种用于创建简化脚本的系统和方法,该脚本由语音天才读取以创建级联的文本到语音(TTS)语音。 该方法可以自动处理预先录制的音频,以便为连续的TTS语音导出语音资源。 预录音音频可以包括由语音用户界面(Sill)使用的记录短语集合。 可以确定一连串的TTS语音的箔语音覆盖所需的一组未实现的语音资产。 可以构造一个简化的脚本,其包括一组短语,当通过语音天赋读取时,会产生减少的语料库。 当自动处理缩减的语料库时,会产生一组减少的语音资源。 缩减的集合包括每个未实现的语音资产。 当这种减少的语料库与现有语音资源相结合时,结果将是具有完整语音资产的语音。

    USER POSITIONABLE AUDIO ANCHORS FOR DIRECTIONAL AUDIO PLAYBACK FROM VOICE-ENABLED INTERFACES
    5.
    发明申请
    USER POSITIONABLE AUDIO ANCHORS FOR DIRECTIONAL AUDIO PLAYBACK FROM VOICE-ENABLED INTERFACES 审中-公开
    用户可通过语音播放界面进行方向音频播放的可位置音频锚杆

    公开(公告)号:US20080262847A1

    公开(公告)日:2008-10-23

    申请号:US11737437

    申请日:2007-04-19

    IPC分类号: G10L21/00 G06F3/041

    CPC分类号: G11B27/105 G10L15/26

    摘要: The present invention discloses a concept and a use of audio anchors within voice-enabled interfaces. Audio anchors can be user configurable points from which audio playback occurs. In the invention, a user can identify an interface position at which an audio anchor is to be established. The computing device can determine an anchor direction setting, with values that include forward playback and backward playback. Interface items can then be audibly enumerated from the audio anchor in a direction indicated by the anchor direction setting. For example, if a set of interface items are alphabetically ordered items and if an audio anchor is set at a first item beginning with a letter “G” and an anchor direction is set to indicate backward playback, then the interface items beginning with letters “A-F” can be audibly played in reverse alphabetical order. Additionally, a rate of audio playback can be user adjustable.

    摘要翻译: 本发明公开了在支持语音的接口内的音频锚的概念和用途。 音频锚点可以是发生音频播放的用户可配置点。 在本发明中,用户可以识别要建立音频锚的接口位置。 计算设备可以确定锚方向设置,其值包括前向播放和向后播放。 然后可以从锚定方向设置指示的方向从音频锚点可听见地列举接口项目。 例如,如果一组接口项是按字母排序的项目,并且如果音频锚点被设置在以字母“G”开始的第一项目,并且将锚定方向设置为指示向后播放,则以字母“ AF“可以以相反的字母顺序播放。 此外,音频播放速率可以是用户可调节的。

    Printing to a text-to-speech output device
    7.
    发明授权
    Printing to a text-to-speech output device 有权
    打印到文本到语音输出设备

    公开(公告)号:US08170877B2

    公开(公告)日:2012-05-01

    申请号:US11156958

    申请日:2005-06-20

    IPC分类号: G10L13/08

    CPC分类号: G10L13/00

    摘要: A method for producing speech output can include the step of selecting a TTS output device from a plurality of available output devices. The selected output device can be associated with outputting content of an application responsive to a print command. According to the method, the print command can be detected, which results in the content of the application being conveyed to the selected TTS output device. The TTS output device can be associated with at least one text-to-speech engine. Upon content conveyance to the TTS output device, at least a portion of the content can be automatically converted using the text-to-speech engine. The speech converted content can be outputted.

    摘要翻译: 用于产生语音输出的方法可以包括从多个可用输出设备中选择TTS输出设备的步骤。 选择的输出设备可以响应于打印命令与输出应用的内容相关联。 根据该方法,可以检测打印命令,这导致应用程序的内容被传送到所选择的TTS输出设备。 TTS输出设备可以与至少一个文本到语音引擎相关联。 当内容传送到TTS输出设备时,可以使用文本到语音引擎自动转换内容的至少一部分。 可以输出语音转换的内容。

    Methods and system for creating and editing an XML-based speech synthesis document
    8.
    发明授权
    Methods and system for creating and editing an XML-based speech synthesis document 失效
    用于创建和编辑基于XML的语音合成文档的方法和系统

    公开(公告)号:US08265936B2

    公开(公告)日:2012-09-11

    申请号:US12132412

    申请日:2008-06-03

    IPC分类号: G10L21/00

    CPC分类号: G10L13/08 G10L15/26

    摘要: A method for creating and editing an XML-based speech synthesis document for input to a text-to-speech engine is provided. The method includes recording voice utterances of a user reading a pre-selected text and parsing the recorded voice utterances into individual words and periods of silence. The method also includes recording a synthesized speech output generated by a text-to-speech engine, the synthesized speech output being an audible rendering of the pre-selected text, and parsing the synthesized speech output into individual words and periods of silence. The method further includes annotating the XML-based speech synthesis document based upon a comparison of the recorded voice utterances and the recorded synthesized speech output.

    摘要翻译: 提供了一种用于创建和编辑用于输入到文本到语音引擎的基于XML的语音合成文档的方法。 该方法包括记录读取预先选择的文本的用户的语音话语,并将记录的语音话语解析为单独的单词和静音时段。 该方法还包括记录由文本到语音引擎生成的合成语音输出,合成语音输出是预选文本的可听渲染,以及将合成的语音输出解析为单独的单词和静音时段。 该方法还包括基于记录的语音发音和所记录的合成语音输出的比较来注释基于XML的语音合成文档。

    METHODS AND SYSTEM FOR CREATING AND EDITING AN XML-BASED SPEECH SYNTHESIS DOCUMENT
    9.
    发明申请
    METHODS AND SYSTEM FOR CREATING AND EDITING AN XML-BASED SPEECH SYNTHESIS DOCUMENT 失效
    用于创建和编辑基于XML的语音合成文档的方法和系统

    公开(公告)号:US20090299733A1

    公开(公告)日:2009-12-03

    申请号:US12132412

    申请日:2008-06-03

    IPC分类号: G10L15/04 G10L15/00

    CPC分类号: G10L13/08 G10L15/26

    摘要: A method for creating and editing an XML-based speech synthesis document for input to a text-to-speech engine is provided. The method includes recording voice utterances of a user reading a pre-selected text and parsing the recorded voice utterances into individual words and periods of silence. The method also includes recording a synthesized speech output generated by a text-to-speech engine, the synthesized speech output being an audible rendering of the pre-selected text, and parsing the synthesized speech output into individual words and periods of silence. The method further includes annotating the XML-based speech synthesis document based upon a comparison of the recorded voice utterances and the recorded synthesized speech output.

    摘要翻译: 提供了一种用于创建和编辑用于输入到文本到语音引擎的基于XML的语音合成文档的方法。 该方法包括记录读取预先选择的文本的用户的语音话语,并将记录的语音话语解析为单独的单词和静音时段。 该方法还包括记录由文本到语音引擎生成的合成语音输出,合成语音输出是预选文本的可听渲染,以及将合成的语音输出解析为单独的单词和静音时段。 该方法还包括基于记录的语音发音和所记录的合成语音输出的比较来注释基于XML的语音合成文档。

    Improving speech capabilities of a multimodal application
    10.
    发明授权
    Improving speech capabilities of a multimodal application 有权
    提高多模式应用程序的语音能力

    公开(公告)号:US08380513B2

    公开(公告)日:2013-02-19

    申请号:US12468166

    申请日:2009-05-19

    IPC分类号: G10L11/00

    摘要: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

    摘要翻译: 改善多模式应用的语音能力,包括由多模式浏览器接收具有元数据容器的媒体文件; 由所述多模式浏览器从所述元数据容器检索与存储在所述媒体文件中的内容相关的语音伪像,以包括在所述多模式浏览器中可用的语音引擎中; 确定语音伪影是否包括语法规则或发音规则; 如果语音工件包括语法规则,则由多模式浏览器修改语音引擎的语法以包括语法规则; 并且如果语音伪影包括发音规则,则由多模式浏览器修改语音引擎的词典以包括发音规则。