Vocal source extraction by maximum phase detection
    1.
    发明授权
    Vocal source extraction by maximum phase detection 有权
    通过最大相位检测进行声源提取

    公开(公告)号:US09105272B2

    公开(公告)日:2015-08-11

    申请号:US13487275

    申请日:2012-06-04

    IPC分类号: G10L25/75 G10L25/03 G10L25/45

    CPC分类号: G10L25/75 G10L25/03 G10L25/45

    摘要: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.

    摘要翻译: 方法,装置和计算机程序产品实现本发明的实施例,其包括接收时域语音信号,并从接收到的信号中提取单个音调周期。 提取的单音调周期被转换为频域,并且识别和校正频域的错误分类的根。 使用校正的根,产生频域的最大相位的指示。

    Method and system for text-to-speech synthesis with personalized voice
    2.
    发明授权
    Method and system for text-to-speech synthesis with personalized voice 有权
    用于个性化语音的文本到语音合成的方法和系统

    公开(公告)号:US08886537B2

    公开(公告)日:2014-11-11

    申请号:US11688264

    申请日:2007-03-20

    摘要: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).

    摘要翻译: 提供了一种用于具有个性化语音的文本到语音合成的方法和系统。 该方法包括从输入扬声器(401)接收音频通信形式的语音的附带音频输入(403),并产生用于输入扬声器(401)的语音数据集(404)。 该方法包括在与音频输入(403)相同的设备处接收文本输入(411),并将来自文本输入(411)的文本合成(312)到包括使用语音数据集(404)的合成语音,以个性化合成的 语音类似于输入扬声器(401)。 此外,该方法包括分析(316)表达的文本并将表达式(315)添加到合成语音。 音频通信可以是视频通信的一部分(453),并且音频输入(403)可以具有输入说话者的图像的相关视觉输入(455)。 来自文本的合成可以包括提供个性化的看起来像输入说话者的图像的合成图像,其中从视觉输入(455)添加表达。

    VOCAL SOURCE EXTRACTION BY MAXIMUM PHASE DETECTION
    3.
    发明申请
    VOCAL SOURCE EXTRACTION BY MAXIMUM PHASE DETECTION 有权
    通过最大相位检测提取VOCAL SOURCE

    公开(公告)号:US20130325455A1

    公开(公告)日:2013-12-05

    申请号:US13487275

    申请日:2012-06-04

    IPC分类号: G10L11/04

    CPC分类号: G10L25/75 G10L25/03 G10L25/45

    摘要: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.

    摘要翻译: 方法,装置和计算机程序产品实现本发明的实施例,其包括接收时域语音信号,并从接收到的信号中提取单个音调周期。 提取的单音调周期被转换为频域,并且识别和校正频域的错误分类的根。 使用校正的根,产生频域的最大相位的指示。

    VOICE TRANSFORMATION WITH ENCODED INFORMATION
    4.
    发明申请
    VOICE TRANSFORMATION WITH ENCODED INFORMATION 有权
    语音转换与编码信息

    公开(公告)号:US20120239387A1

    公开(公告)日:2012-09-20

    申请号:US13049924

    申请日:2011-03-17

    IPC分类号: G10L19/02

    CPC分类号: G10L21/003 G10L19/018

    摘要: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

    摘要翻译: 提供语音转换的方法,系统和计算机程序产品。 该方法包括使用变换参数来变换源语言,以及使用隐写术对输入语音中的变换参数对信息进行编码,其中可以使用输出语音和关于变换参数的信息来重构源语音。 还提供了一种用于重建语音变换的方法,包括:接收语音转换系统的输出语音,其中输出语音是使用隐写术编码关于变换参数的信息的变换语音; 提取变换参数信息; 并执行输出语音的逆变换以获得原始源语音的近似。

    Voice transformation with encoded information
    5.
    发明授权
    Voice transformation with encoded information 有权
    具有编码信息的语音变换

    公开(公告)号:US08930182B2

    公开(公告)日:2015-01-06

    申请号:US13049924

    申请日:2011-03-17

    CPC分类号: G10L21/003 G10L19/018

    摘要: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

    摘要翻译: 提供语音转换的方法,系统和计算机程序产品。 该方法包括使用变换参数来变换源语言,以及使用隐写术对输入语音中的变换参数对信息进行编码,其中可以使用输出语音和关于变换参数的信息来重构源语音。 还提供了一种用于重建语音变换的方法,包括:接收语音转换系统的输出语音,其中输出语音是使用隐写术编码关于变换参数的信息的变换语音; 提取变换参数信息; 并执行输出语音的逆变换以获得原始源语音的近似。

    METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE
    8.
    发明申请
    METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE 有权
    使用个性化语音进行语音合成的方法和系统

    公开(公告)号:US20080235024A1

    公开(公告)日:2008-09-25

    申请号:US11688264

    申请日:2007-03-20

    IPC分类号: G10L13/00

    摘要: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).

    摘要翻译: 提供了一种用于具有个性化语音的文本到语音合成的方法和系统。 该方法包括从输入扬声器(401)接收音频通信形式的语音的附带音频输入(403),并产生用于输入扬声器(401)的语音数据集(404)。 该方法包括在与音频输入(403)相同的设备处接收文本输入(411),并将来自文本输入(411)的文本合成(312)到包括使用语音数据集(404)的合成语音,以个性化合成的 语音类似于输入扬声器(401)。 此外,该方法包括分析(316)表达的文本并将表达式(315)添加到合成语音。 音频通信可以是视频通信的一部分(453),并且音频输入(403)可以具有输入说话者的图像的相关视觉输入(455)。 来自文本的合成可以包括提供个性化的看起来像输入说话者的图像的合成图像,其中从视觉输入(455)添加表达。

    Method for encoding and decoding spectral phase data for speech signals

    公开(公告)号:US07127389B2

    公开(公告)日:2006-10-24

    申请号:US10243580

    申请日:2002-09-13

    申请人: Dan Chazan Zvi Kons

    发明人: Dan Chazan Zvi Kons

    IPC分类号: G10L11/04

    CPC分类号: G10L25/90

    摘要: A speech decoder and a segment aligner are provided in the present invention. The speech decoder may include a spectrum reconstructor operative to reconstruct the spectrum of a speech segment from the amplitude envelope of the spectrum of said speech segment and pitch information, a phase combiner operative to reconstruct the complex spectrum of the speech segment from the reconstructed spectrum, phase information describing the speech segment, and pitch information describing the speech segment. The speech decoder may further include a delay operative to store a complex spectrum of a previous speech segment; and a segment aligner operative to determine the relative offset between the complex spectrum of the speech segment and the complex spectrum of the previous speech segment, align the position of the first pitch excitation of the current speech segment to the last pitch excitation of the previous speech segment; and to apply a time shift and a complex Hilbert filter to said complex spectra, wherein the segment aligner is operative to cross-correlate the complex spectra as C ⁡ ( τ ) = ∑ n = 0 N ⁢ ⁢ F n ⁢ G _ m ⁢ ⅇ - 2 ⁢ ⁢ π ⁢ ⁢ in ⁢ ⁢ τ , m = ⌊ n ⁢ p G p F + 0.5 ⌋ , where Fn and Gm are the computed complex magnitude of the pitch harmonics n and m of the current and previous spectra respectively, and pF and pG are their corresponding pitch periods.