MULTILINGUAL PROSODY GENERATION
    61.
    发明申请

    公开(公告)号:US20160071512A1

    公开(公告)日:2016-03-10

    申请号:US14942300

    申请日:2015-11-16

    申请人: Google Inc.

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

    System and method for distributed voice models across cloud and device for embedded text-to-speech
    62.
    发明授权
    System and method for distributed voice models across cloud and device for embedded text-to-speech 有权
    跨云的分布式语音模型和嵌入式文本到语音的设备的系统和方法

    公开(公告)号:US09218804B2

    公开(公告)日:2015-12-22

    申请号:US14025344

    申请日:2013-09-12

    IPC分类号: G10L13/07

    摘要: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

    摘要翻译: 用于智能缓存用于语音合成的级联语音单元的系统,方法和计算机可读存储介质。 配置为实施该方法的系统可以识别语音合成上下文,并且基于用于文本到语音语音的文本到语音单元的本地高速缓存并且基于语音合成上下文来确定附加的文本 - 不在本地缓存中的语音单元。 系统可以从服务器请求附加的文本到语音单元,并将附加的文本到语音单元存储在本地高速缓存中。 然后,系统可以使用本地高速缓存中的文本到语音单元和附加的文本到语音单元来合成语音。 系统可以根据本地存储的可用性,或合成语音之后随着上下文的变化修剪缓存。 本地缓存可以存储与文本到语音语音相关联的文本到语音单元的核心集合,其不能从本地高速缓存中修剪。

    Multilingual prosody generation
    63.
    发明授权
    Multilingual prosody generation 有权
    多语言韵律一代

    公开(公告)号:US09195656B2

    公开(公告)日:2015-11-24

    申请号:US14143627

    申请日:2013-12-30

    申请人: Google Inc.

    IPC分类号: G10L13/08 G06F17/28 G10L13/10

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

    摘要翻译: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于多语言韵律生成。 在一些实现中,获得指示与文本相对应的一组语言特征的数据。 指示语言特征的数据和指示文本语言的数据被提供给已经被训练以提供指示多种语言的韵律信息的输出的神经网络的输入。 神经网络可以是已经使用多种语言的语音训练的神经网络。 从神经网络接收到表示语言特征的韵律信息的输出。 使用神经网络的输出生成表示文本的音频数据。

    SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH
    64.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH 有权
    用于分布式语音模型的系统和方法用于嵌入式文本到语音的云和设备

    公开(公告)号:US20150073805A1

    公开(公告)日:2015-03-12

    申请号:US14025344

    申请日:2013-09-12

    IPC分类号: G10L13/07

    摘要: Disclosed herein are systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

    摘要翻译: 本文公开了用于智能缓存用于语音合成中的级联语音单元的系统,方法和计算机可读存储介质。 配置为实施该方法的系统可以识别语音合成上下文,并且基于用于文本到语音语音的文本到语音单元的本地高速缓存并且基于语音合成上下文来确定附加的文本 - 不在本地缓存中的语音单元。 系统可以从服务器请求附加的文本到语音单元,并将附加的文本到语音单元存储在本地高速缓存中。 然后,系统可以使用本地高速缓存中的文本到语音单元和附加的文本到语音单元来合成语音。 系统可以根据本地存储的可用性,或合成语音之后随着上下文的变化修剪缓存。 本地缓存可以存储与文本到语音语音相关联的文本到语音单元的核心集合,其不能从本地高速缓存中修剪。

    SPEECH SYNTHESIS FROM ACOUSTIC UNITS WITH DEFAULT VALUES OF CONCATENATION COST
    65.
    发明申请
    SPEECH SYNTHESIS FROM ACOUSTIC UNITS WITH DEFAULT VALUES OF CONCATENATION COST 有权
    声音单位的语音合成与定价成本的默认值

    公开(公告)号:US20140330567A1

    公开(公告)日:2014-11-06

    申请号:US14335302

    申请日:2014-07-18

    摘要: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. When a pair of acoustic units in the database does not have an associated concatenation cost, the system assigns a default concatenation cost. The system then synthesizes speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.

    摘要翻译: 语音合成系统可以从声学单元的非常大的数据库中选择记录的语音片段或声学单元,以产生人造语音。 当数据库中的一对声学单元没有相关联的级联成本时,系统将分配默认的级联成本。 系统然后合成语音,识别生成的声学单元序列对及其各自的级联成本,并存储可能发生的这些级联成本。

    System and method for unit selection text-to-speech using a modified Viterbi approach
    66.
    发明授权
    System and method for unit selection text-to-speech using a modified Viterbi approach 有权
    使用修改的维特比法进行单位选择文本到语音的系统和方法

    公开(公告)号:US08731931B2

    公开(公告)日:2014-05-20

    申请号:US12818835

    申请日:2010-06-18

    IPC分类号: G10L13/00 G10L13/06

    摘要: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

    摘要翻译: 本文公开了用于语音合成的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统接收一组有序列表的语音单元,对于有序列表组中的每个有序列表中的每个相应的语音单元,从适合于级联的下一个有序列表构建语音单元的子列表,执行一个 基于用于每个相应语音单元的语音单元的子列表,通过语音单元的有序列表集合的路径的成本分析,并且基于成本分析,通过所述一组有序列表使用语音单元的最低成本路径来合成语音。 有序列表可以基于每个语音单元的相应音调来排序。 在一个实施例中,可以分配不具有分配音调的语音单元。

    Speech synthesizing device, computer program product, and method
    67.
    发明授权
    Speech synthesizing device, computer program product, and method 有权
    语音合成装置,计算机程序产品和方法

    公开(公告)号:US08626510B2

    公开(公告)日:2014-01-07

    申请号:US12559844

    申请日:2009-09-15

    申请人: Nobuaki Mizutani

    发明人: Nobuaki Mizutani

    IPC分类号: G10L13/00

    CPC分类号: G10L13/08 G10L13/07

    摘要: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.

    摘要翻译: 获取单元获取彼此相似并且包括固定段和非固定段的模式句子,以及替代非固定段的替换字。 句子生成单元通过用每个模式句子的替代词替换非固定段来产生目标句子。 第一合成声产生单元产生第一合成声音,固定音段的合成声音,第二合成声音产生单元产生用于每个目标句子的第二合成声音,替代词的合成声音。 计算单元计算每个目标句子的第一合成声音和第二合成声音之间的边界的不连续值,并且选择单元选择具有最小不连续值的目标语句。 连接单元连接所选择的目标句子的第一合成声音和第二合成声音。

    Tabulating triphone sequences by 5-phoneme contexts for speech synthesis
    68.
    发明授权
    Tabulating triphone sequences by 5-phoneme contexts for speech synthesis 有权
    通过5个音素语境制作三音节序列用于语音合成

    公开(公告)号:US08566099B2

    公开(公告)日:2013-10-22

    申请号:US13550074

    申请日:2012-07-16

    IPC分类号: G10L13/00 G10L13/06

    CPC分类号: G10L13/07 G10L2015/022

    摘要: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.

    摘要翻译: 一种用于改善使用三耳机上下文的文本到语音合成的响应时间的系统和方法。 该方法包括识别一组三电话序列,使用多个上下文列表三组电话序列集合,其中多个上下文特定三电话序列中的每个上下文特定三音节序列具有由具有最低目标的三电话单元制成的前N个三音单元 每个三音单元单独组合成5音素组合的费用。 接收具有上下文之一的输入文本,并且基于上下文来选择上下文特定三通电话序列之一。 然后使用上下文特定的三音节序列合成输入文本。

    SPEECH SYNTHESIS APPARATUS AND METHOD
    69.
    发明申请
    SPEECH SYNTHESIS APPARATUS AND METHOD 有权
    语音合成设备和方法

    公开(公告)号:US20130226584A1

    公开(公告)日:2013-08-29

    申请号:US13860319

    申请日:2013-04-10

    IPC分类号: G10L13/04

    摘要: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of an input phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by concatenating the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.

    摘要翻译: 语音合成装置包括:选择器,被配置为通过参考存储在信息存储器中的语音单元信息来选择用于合成输入音素序列的语音的多个语音单元。 从存储在波形存储器中的多个语音单元波形获取对应于语音单元的语音单元波形,并且通过连接所获取的语音单元波形来合成语音。 当获取语音单元波形时,来自波形存储器的连续区域的至少两个语音单元波形通过一次访问被复制到缓冲器上,其中至少两个语音单元波形的数据量小于或等于尺寸 的缓冲区。

    Speech samples library for text-to-speech and methods and apparatus for generating and using same
    70.
    发明授权
    Speech samples library for text-to-speech and methods and apparatus for generating and using same 有权
    用于文本到语音的语音样本库以及用于生成和使用它的方法和装置

    公开(公告)号:US08340967B2

    公开(公告)日:2012-12-25

    申请号:US12532170

    申请日:2008-03-19

    IPC分类号: G10L13/00

    CPC分类号: G10L13/08 G10L13/06 G10L13/07

    摘要: A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.

    摘要翻译: 一种用于语音样本库中记录语音的方法。 在一个示例性实施例中,该方法包括记录用同一个或另一个发音者表征另一个音素的发音的音乐参数来发音音素的讲话者。 例如,在一个实施例中,该方法包括:提供在音素上下文中发音第一音素的第一说话者的记录。 发音的特点是一些音乐参数。 然后,可以与第一读者相同的第二读取器被记录下来,通过第一说话者发音表示第一音素的发音的音乐参数发音第二音素(与第一音素不同)。 由第二读取器进行的记录用于编译语音样本库。