Method and System for Building Text-to-Speech Voice from Diverse Recordings
    1.
    发明申请
    Method and System for Building Text-to-Speech Voice from Diverse Recordings 有权
    从不同录音中构建文字到语音的方法和系统

    公开(公告)号:US20160140951A1

    公开(公告)日:2016-05-19

    申请号:US14540088

    申请日:2014-11-13

    Applicant: Google Inc.

    CPC classification number: G10L13/02 G10L13/06 G10L25/03

    Abstract: A method and system is disclosed for building a speech database for a text-to-speech (TTS) synthesis system from multiple speakers recorded under diverse conditions. For a plurality of utterances of a reference speaker, a set of reference-speaker vectors may be extracted, and for each of a plurality of utterances of a colloquial speaker, a respective set of colloquial-speaker vectors may be extracted. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each colloquial-speaker vector to a reference-speaker vector. The colloquial-speaker vector may be replaced with the matched reference-speaker vector. The matching-and-replacing can be carried out separately for each set of colloquial-speaker vectors. A conditioned set of speaker vectors can then be constructed by aggregating all the replaced speaker vectors. The condition set of speaker vectors can be used to train the TTS system.

    Abstract translation: 公开了一种用于从在不同条件下记录的多个扬声器构建文本到语音(TTS)合成系统的语音数据库的方法和系统。 对于参考扬声器的多个话语,可以提取一组参考扬声器向量,并且对于口语扬声器的多个话语中的每一个,可以提取相应的一组口语扬声器向量。 在补偿扬声器差异的变换下执行的匹配过程可以用于将每个口语扬声器向量与参考扬声器矢量相匹配。 口语扬声器矢量可以用匹配的参考扬声器矢量代替。 可以针对每组口语扬声器向量单独执行匹配和替换。 然后可以通过聚合所有替换的说话者向量来构建一组有条理的扬声器向量。 扬声器矢量的条件集可用于训练TTS系统。

    Methods and Systems for Voice Conversion
    2.
    发明申请
    Methods and Systems for Voice Conversion 有权
    语音转换方法与系统

    公开(公告)号:US20160005403A1

    公开(公告)日:2016-01-07

    申请号:US14631464

    申请日:2015-02-25

    Applicant: Google Inc.

    CPC classification number: G10L15/07 G10L17/06 G10L25/75 G10L2021/0135

    Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.

    Abstract translation: 设备可以接收指示与第一语音的第一语音特征相关联的多个语音的数据。 设备可以接收指示与第二语音的第二语音特征相关联的语音的输入。 设备可以将第二语音的语音的至少一部分映射到第一语音的多个语音的一个或多个语音。 设备可以基于地图将第一语音特征与第二语音特征进行比较。 比较可以包括声道特征,鼻腔特征和发音特征。 设备可以确定被配置为将第一语音特征与第二语音特征相关联的给定表示。 该装置可以基于给定的表示,根据第二语音特征提供指示第一语音的一个或多个语音的发音的输出。

    Methods and systems for voice conversion

    公开(公告)号:US09613620B2

    公开(公告)日:2017-04-04

    申请号:US14631464

    申请日:2015-02-25

    Applicant: Google Inc.

    CPC classification number: G10L15/07 G10L17/06 G10L25/75 G10L2021/0135

    Abstract: A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.

    Melody recognition systems
    5.
    发明授权
    Melody recognition systems 有权
    旋律识别系统

    公开(公告)号:US09569532B1

    公开(公告)日:2017-02-14

    申请号:US14300600

    申请日:2014-06-10

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.

    Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于从视频集合中选择一组候选视频,所述一组候选视频被识别为与特定歌曲相关联,以及(ii) 被列为无伴奏视频录像; 从所述候选视频的音频频道中提取来自所述组的每个候选视频的单声道旋律线; 基于所述子集的候选视频的单声道旋律线的相似度,从所述一组候选视频中选择所述候选视频的子集; 以及提供识别器,其识别由人类声音产生的声音的歌曲,(i)特定歌曲的标识符,以及(ii)该子集的候选视频的一个或多个单声道旋律线。

    Devices and methods for weighting of local costs for unit selection text-to-speech synthesis
    6.
    发明授权
    Devices and methods for weighting of local costs for unit selection text-to-speech synthesis 有权
    用于加权单位选择文本到语音合成的本地成本的设备和方法

    公开(公告)号:US09460705B2

    公开(公告)日:2016-10-04

    申请号:US14087260

    申请日:2013-11-22

    Applicant: Google Inc.

    CPC classification number: G10L13/07

    Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.

    Abstract translation: 设备可以确定包括与第一组语音相关联的第一语言术语的文本的表示以及与第二组语音相关联的第二语言术语。 设备可以确定第一组和第二组之间的多个联接。 给定的连接可以指示从第一组连接来自第二组的第二语音的第一语音。 给定连接的给定本地成本可以对应于单个成本的加权和。 可以基于多个联接中的给定个体成本的可变性来加权给定的个体成本。 该装置可以基于序列中相邻语音的本地成本的总和的最小化来提供指示文本的发音的语音声音序列。

    Methods and systems for automated generation of nativized multi-lingual lexicons
    7.
    发明授权
    Methods and systems for automated generation of nativized multi-lingual lexicons 有权
    自动生成本土化多语言词典的方法和系统

    公开(公告)号:US09263028B2

    公开(公告)日:2016-02-16

    申请号:US14283586

    申请日:2014-05-21

    Applicant: Google Inc.

    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

    Abstract translation: 包括第一语言的语言内容的输入信号可以被计算设备接收。 语言内容可能包括文字或言语。 计算设备可将第一语言中的语言内容与来自第二语言的一个或多个音素相关联。 计算设备还可以基于来自第二语言的一个或多个音素的使用来确定第一语言中的语言内容的音位表示。 根据第二语言的语音,音素表示可以指示第一语言中的语言内容的发音。

    Method and system for cross-lingual voice conversion
    8.
    发明授权
    Method and system for cross-lingual voice conversion 有权
    交叉语音转换的方法和系统

    公开(公告)号:US09177549B2

    公开(公告)日:2015-11-03

    申请号:US14069492

    申请日:2013-11-01

    Applicant: Google Inc.

    Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.

    Abstract translation: 公开了用于跨语言语音转换的方法和系统。 语音到语音系统可以包括用于识别输入语音和合成输出语音的隐马尔可夫模型(HMM)基于HMM的语音建模。 可以最初将跨语言HMM设置为以输出语言的输出说话者的语音训练的输出HMM。 辅助HMM可以用输入语言的辅助扬声器的声音进行训练。 可以使用在补偿扬声器差异的变换下执行的匹配过程来将输出HMM的每个HMM状态与辅助HMM的HMM状态相匹配。 跨语言HMM的HMM状态可以被替换为匹配状态。 可以应用变换来使跨语言HMM适应于辅助扬声器和输入扬声器的声音。 跨语言HMM可用于语音合成。

    Devices and Methods for Weighting of Local Costs for Unit Selection Text-to-Speech Synthesis
    9.
    发明申请
    Devices and Methods for Weighting of Local Costs for Unit Selection Text-to-Speech Synthesis 有权
    用于单位选择的本地成本加权的设备和方法文本到语音合成

    公开(公告)号:US20150134339A1

    公开(公告)日:2015-05-14

    申请号:US14087260

    申请日:2013-11-22

    Applicant: Google Inc

    CPC classification number: G10L13/07

    Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.

    Abstract translation: 设备可以确定包括与第一组语音相关联的第一语言术语的文本的表示以及与第二组语音相关联的第二语言术语。 设备可以确定第一组和第二组之间的多个联接。 给定的连接可以指示从第一组连接来自第二组的第二语音的第一语音。 给定连接的给定本地成本可以对应于单个成本的加权和。 可以基于多个联接中的给定个体成本的可变性来加权给定的个体成本。 该装置可以基于序列中相邻语音的本地成本的总和的最小化来提供指示文本的发音的语音声音序列。

    Devices and methods for speech unit reduction in text-to-speech synthesis systems
    10.
    发明授权
    Devices and methods for speech unit reduction in text-to-speech synthesis systems 有权
    文本到语音合成系统中语音单元缩减的设备和方法

    公开(公告)号:US08751236B1

    公开(公告)日:2014-06-10

    申请号:US14061118

    申请日:2013-10-23

    Applicant: Google Inc.

    CPC classification number: G10L13/06

    Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.

    Abstract translation: 设备可以接收指示第一语言术语的发音的多个语音。 设备可以确定多个语音的连接特征。 当第一语音和第二语音被级联时,级联特征可以指示第一语音和第二语音之间的声转换。 第一语音可以被包括在多个语音中,第二语音可以指示第二语言术语的发音。 该装置可以基于级联特征将多个语音进行聚类成一个或多个簇。 当第一语音和第二语音被级联时,该设备可以提供给定簇的代表性语音作为第一语音。

Patent Agency Ranking