Text-to-speech process capable of interspersing recorded words and phrases

    公开(公告)号:US20180114523A1

    公开(公告)日:2018-04-26

    申请号:US15792861

    申请日:2017-10-25

    申请人: Cepstral, LLC

    IPC分类号: G10L13/08

    CPC分类号: G10L13/08 G10L13/086

    摘要: Creating and deploying a voice from text-to-speech, with such voice being a new language derived from the original phoneset of a known language, and thus being audio of the new language outputted using a single TTS synthesizer. An end product message is determined in an original language n to be outputted as audio n by a text-to-speech engine, wherein the original language n includes an existing phoneset n including one or more phonemes n. Words and phrases of a new language n+1 are recorded, thereby forming audio file n+1. This new audio file is labeled into unique units, thereby defining one or more phonemes n+1. The new phonemes of the new language are added to the phoneset, thereby forming new phoneset n+1, as a result outputting the end product message as an audio n+1 language different from the original language n.

    Multilingual prosody generation
    53.
    发明授权

    公开(公告)号:US09905220B2

    公开(公告)日:2018-02-27

    申请号:US14942300

    申请日:2015-11-16

    申请人: Google Inc.

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

    SYSTEM AND METHOD FOR INTELLIGENT LANGUAGE SWITCHING IN AUTOMATED TEXT-TO-SPEECH SYSTEMS

    公开(公告)号:US20170236509A1

    公开(公告)日:2017-08-17

    申请号:US15583068

    申请日:2017-05-01

    IPC分类号: G10L13/08 G10L13/047

    摘要: Systems, methods, and computer-readable storage media for providing for intelligent switching of languages and/or pronunciations in a text-to-speech system. As the system receives text, the text is analyzed to identify portions which should have speech constructed using a pronunciation distinct from the remaining portions of the text. The text-to-speech system uses multiple pronunciation dictionaries to generate and produce speech corresponding to the text, where the identified portions of the text are in a different language or have a different accent from the remainder of the text. Having generated speech corresponding to the text in multiple languages, accents, or dialects, the system combines the portions, then communicates the speech to the text recipient.

    TEXT-TO-SPEECH METHOD AND MULTI-LINGUAL SPEECH SYNTHESIZER USING THE METHOD
    56.
    发明申请
    TEXT-TO-SPEECH METHOD AND MULTI-LINGUAL SPEECH SYNTHESIZER USING THE METHOD 有权
    使用该方法的文本到语音方法和多语音合成器

    公开(公告)号:US20170047060A1

    公开(公告)日:2017-02-16

    申请号:US14956405

    申请日:2015-12-02

    摘要: A text to-speech method and a multi-lingual speech synthesizer using the method are disclosed. The multi-lingual speech synthesizer and the method executed by processor are applied for processing a multi-lingual text message in a mixture of a first language and a second language into a multi-lingual voice message. The multi-lingual speech synthesizer comprises a storage device configured to store a first language model database, second language model database a broadcasting device configured to broadcast the multi-lingual voice message, and a processor, connected to the storage de ice and the broadcasting device, configured to ex cute the method disclosed herein.

    摘要翻译: 公开了一种使用该方法的文本语音方法和多语言语音合成器。 多语言语音合成器和由处理器执行的方法被应用于将第一语言和第二语言的混合的多语言文本消息处理成多语言语音消息。 多语言语音合成器包括存储装置,其被配置为存储第一语言模型数据库,第二语言模型数据库,配置成广播多语言语音消息的广播装置,以及连接到存储装置和广播装置的处理器 ,被配置为可爱的这里公开的方法。

    QUANTITATIVE F0 CONTOUR GENERATING DEVICE AND METHOD, AND MODEL LEARNING DEVICE AND METHOD FOR F0 CONTOUR GENERATION
    57.
    发明申请
    QUANTITATIVE F0 CONTOUR GENERATING DEVICE AND METHOD, AND MODEL LEARNING DEVICE AND METHOD FOR F0 CONTOUR GENERATION 审中-公开
    量化F0轮廓生成装置和方法,以及用于F0轮廓生成的模型学习装置和方法

    公开(公告)号:US20160189705A1

    公开(公告)日:2016-06-30

    申请号:US14911189

    申请日:2014-08-13

    摘要: [Object] An object is to provide an F0 contour synthesizing device based on statistic model, to clarify correspondence between linguistic information and F0 contour while maintaining accuracy.[Solution] An HMM learning device includes: a parameter estimating unit representing an F0 contour 133 fitting a continuous F0 contour 132 as a sum of phrase components and accent components and estimating target points of these; and an HMM learning means conducting learning of HMM 139 using the fitted F0 contour as training data. The continuous F0 contour may be decomposed to accent components 134, phrase components 136 and micro-prosody components 138, and separate HMMs 140, 142 and 144 may be trained. Using results of text analysis, accent components, phrase components and micro-prosody components are separately synthesized from HMMs 140, 142 and 144 and the results are synthesized to obtain an F0 contour.

    摘要翻译: [目的]一个目的是提供一种基于统计模型的F0轮廓合成装置,以清楚语言信息与F0轮廓之间的对应关系,同时保持精度。 解决方案HMM学习装置包括:表示F0轮廓133的参数估计单元,其将连续F0轮廓132拟合为短语分量和重音分量的和,并估计它们的目标点; 以及使用拟合的F0轮廓进行HMM 139的学习的HMM学习装置作为训练数据。 可以将连续的F0轮廓分解为强调组件134,短语组件136和微韵律组件138以及单独的HMM 140,142和144可以被训练。 使用文本分析的结果,重音组件,短语组件和微韵律分量由HMM 140,142和144分别合成,并且合成结果以获得F0轮廓。

    Methods and systems for automated generation of nativized multi-lingual lexicons
    58.
    发明授权
    Methods and systems for automated generation of nativized multi-lingual lexicons 有权
    自动生成本土化多语言词典的方法和系统

    公开(公告)号:US09263028B2

    公开(公告)日:2016-02-16

    申请号:US14283586

    申请日:2014-05-21

    申请人: Google Inc.

    摘要: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

    摘要翻译: 包括第一语言的语言内容的输入信号可以被计算设备接收。 语言内容可能包括文字或言语。 计算设备可将第一语言中的语言内容与来自第二语言的一个或多个音素相关联。 计算设备还可以基于来自第二语言的一个或多个音素的使用来确定第一语言中的语言内容的音位表示。 根据第二语言的语音,音素表示可以指示第一语言中的语言内容的发音。

    Method, device and system for providing language service
    59.
    发明授权
    Method, device and system for providing language service 有权
    用于提供语言服务的方法,设备和系统

    公开(公告)号:US09128930B2

    公开(公告)日:2015-09-08

    申请号:US14563939

    申请日:2014-12-08

    摘要: A method, device and system for providing a language service are disclosed. In some embodiments, the method is performed at a computer system having one or more processors and memory for storing programs to be executed by the one or more processors. The method includes receiving a first message from a client device. The method includes determining if the first message is in a first language or a second language different than the first language. The method includes translating the first message into a second message in the second language if the first message is in the first language. The method includes, alternatively, generating a third message in the second language if the first message is in the second language, where the third message includes a conversational response to the first message. The method further includes returning one of the second message and the third message to the client device.

    摘要翻译: 公开了一种用于提供语言服务的方法,设备和系统。 在一些实施例中,该方法在具有一个或多个处理器的计算机系统和用于存储要由一个或多个处理器执行的程序的存储器中执行。 该方法包括从客户端设备接收第一消息。 该方法包括确定第一消息是否处于与第一语言不同的第一语言或第二语言。 该方法包括:如果第一消息是第一语言,则将第一消息转换成第二语言的第二消息。 如果第一消息是第二语言,则该方法包括或者以第二语言生成第三消息,其中第三消息包括对第一消息的会话响应。 该方法还包括将第二消息和第三消息中的一个返回到客户端设备。

    METHOD AND SYSTEM FOR VOICE SYNTHESIS
    60.
    发明申请
    METHOD AND SYSTEM FOR VOICE SYNTHESIS 审中-公开
    语音合成方法与系统

    公开(公告)号:US20150149181A1

    公开(公告)日:2015-05-28

    申请号:US14411952

    申请日:2013-07-02

    发明人: Vincent Delahaye

    IPC分类号: G10L13/08 G10L13/06

    摘要: Method and system for generating audio signals (9) representative of a text (3) to be converted, the method includes the steps of: providing a database (1) of acoustic units, identifying a list of pre-calculated expressions (10), and recording, for each pre-calculated expression, an acoustic frame (7) corresponding to it being pronounced, decomposing, by virtue of correlation calculations, each recorded acoustic frame into a sequenced table (5) including a series of acoustic unit references modulated by amplitude (α(i)A) and temporal (α(i)T) form factors , identifying in the text the pre-calculated expressions and decomposing the rest (12) into phonemes, inserting in place of each pre-calculated expression the corresponding sequenced table, and preparing a concatenation of acoustic units (19) according to the text to be converted.

    摘要翻译: 用于生成表示要转换的文本(3)的音频信号(9)的方法和系统,所述方法包括以下步骤:提供声学单元的数据库(1),识别预先计算的表达式(10)的列表, 并且对于每个预先计算的表达式,记录与其相对应的声音帧(7),通过相关计算,将每个记录的声音帧分解成包括由一系列声学单元参考调制的序列表(5) 幅度(α(i)A)和时间(α(i)T)形式因子,在文本中识别预先计算的表达式并将其余部分(12)分解为音素,插入每个预计算的表达式相应的 根据待转换的文本准备声学单元(19)的级联。