System and method for unified normalization in text-to-speech and automatic speech recognition

    公开(公告)号:US10199034B2

    公开(公告)日:2019-02-05

    申请号:US14461930

    申请日:2014-08-18

    摘要: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

    System and method for data-driven socially customized models for language generation
    5.
    发明授权
    System and method for data-driven socially customized models for language generation 有权
    用于语言生成的数据驱动的社会定制模型的系统和方法

    公开(公告)号:US09412358B2

    公开(公告)日:2016-08-09

    申请号:US14275938

    申请日:2014-05-13

    摘要: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.

    摘要翻译: 用于使用特定于用户的演示风格来产生语音的系统,方法和计算机可读存储设备,特别是用户的社交组。 根据本公开配置的系统然后可以使用口头对话或呈现系统中的结果,个性化,文本和/或语音来与用户通信。 例如,实施所公开的方法的系统可以从用户接收语音,识别用户,并且通过应用个性化的自然语言生成模型对接收到的语音进行响应。 个性化的自然语言生成模型提供可以对所识别的用户特定的通信。

    System and method for synthetic voice generation and modification
    6.
    发明授权
    System and method for synthetic voice generation and modification 有权
    合成语音产生和修改的系统和方法

    公开(公告)号:US08965767B2

    公开(公告)日:2015-02-24

    申请号:US14282035

    申请日:2014-05-20

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

    摘要翻译: 这里公开了用于产生合成语音的系统,方法和非暂时的计算机可读存储介质。 被配置为实施该方法的系统组合第一文本到语音语音的第一数据库和第二文本到语音语音的第二数据库以生成组合数据库,基于策略从组合数据库中进行选择, 用于合成语音的语音类别的语音单元以产生所选择的语音单元,并且基于所选择的语音单元来合成语音。 该系统可以合成语音,而无需参数化第一个文本到语音的语音和第二个文本到语音的语音。 对于特定语音类别,策略可以定义哪些文本到语音语音来选择语音单元。 组合的数据库可以包括来自不同扬声器的多个文本到语音的声音。 组合的数据库可以包括以不同风格说话的单个扬声器的声音。 组合的数据库可以包括不同语言的语音。

    SYSTEM AND METHOD FOR AUDIBLY PRESENTING SELECTED TEXT
    7.
    发明申请
    SYSTEM AND METHOD FOR AUDIBLY PRESENTING SELECTED TEXT 有权
    用于显示所选文本的系统和方法

    公开(公告)号:US20130304474A1

    公开(公告)日:2013-11-14

    申请号:US13943242

    申请日:2013-07-16

    IPC分类号: G10L13/08

    摘要: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

    摘要翻译: 这里公开的是用于从计算设备上的所选文本呈现语音的方法。 该方法包括在触敏显示器上呈现文本并使该文本大小在阈值水平内,使得当用户触摸触摸屏时计算设备可以准确地确定用户的意图。 一旦接收到用户触摸,计算设备就识别和解释要被选择的文本部分,并随后向用户呈现可听见的文本。

    System and method for prosodically modified unit selection databases

    公开(公告)号:US11049491B2

    公开(公告)日:2021-06-29

    申请号:US16828070

    申请日:2020-03-24

    摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.