-
公开(公告)号:US20150127350A1
公开(公告)日:2015-05-07
申请号:US14069510
申请日:2013-11-01
Applicant: Google Inc.
Inventor: Ioannis Agiomyrgiannakis
CPC classification number: G10L13/02 , G10L13/0335 , G10L15/07 , G10L15/144 , G10L15/26 , G10L21/003
Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
Abstract translation: 公开了用于非参数语音转换的方法和系统。 文本到语音(TTS)合成系统可以包括用于合成输出语音的隐马尔可夫模型(HMM)基于HMM的语音建模。 可以将经转换的HMM初始设置为用源扬声器的声音训练的源HMM。 可以从目标说话者的语音中提取语音的参数表示,以产生一组目标扬声器向量。 可以使用在补偿扬声器差异的变换下执行的匹配过程来将源HMM的每个HMM状态与目标扬声器向量相匹配。 转换的HMM的HMM状态可以用匹配的目标扬声器向量替换。 可以应用变换来进一步使转换的HMM适应目标扬声器的声音。 转换的HMM可以用于合成具有目标扬声器的语音特征的语音。
-
公开(公告)号:US09008490B1
公开(公告)日:2015-04-14
申请号:US13776017
申请日:2013-02-25
Applicant: Google Inc.
Inventor: Matthew Sharifi , Dominik Roblek , Vera Dron , Ioannis Agiomyrgiannakis
CPC classification number: G06F17/3082 , G06F17/30758 , G06F17/30787 , G10H1/0008 , G10H1/368 , G10H2240/075 , G10H2240/135 , G10H2240/141
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting, from among a collection of videos, a set of candidate videos that (i) are identified as being associated with a particular song, and (ii) are classified as a cappella video recordings; extracting, from each of the candidate videos of the set, a monophonic melody line from an audio channel of the candidate video; selecting, from among the set of candidate videos, a subset of the candidate videos based on a similarity of the monophonic melody line of the candidate videos of the subset with each other; and providing, to a recognizer that recognizes songs from sounds produced by a human voice, (i) an identifier of the particular song, and (ii) one or more of the monophonic melody lines of the candidate videos of the subset.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于从视频集合中选择一组候选视频,所述一组候选视频被识别为与特定歌曲相关联,以及(ii) 被列为无伴奏视频录像; 从所述候选视频的音频频道中提取来自所述组的每个候选视频的单声道旋律线; 基于所述子集的候选视频的单声道旋律线的相似度,从所述一组候选视频中选择所述候选视频的子集; 以及提供识别器,其识别由人类声音产生的声音的歌曲,(i)特定歌曲的标识符,以及(ii)该子集的候选视频的一个或多个单声道旋律线。
-