Pitch marking in speech processing

    公开(公告)号:US09685170B2

    公开(公告)日:2017-06-20

    申请号:US14918601

    申请日:2015-10-21

    Inventor: Slava Shechtman

    CPC classification number: G10L21/01 G10L21/013 G10L25/06 G10L25/90

    Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

    PITCH MARKING IN SPEECH PROCESSING

    公开(公告)号:US20170117001A1

    公开(公告)日:2017-04-27

    申请号:US14918601

    申请日:2015-10-21

    Inventor: Slava Shechtman

    CPC classification number: G10L21/01 G10L21/013 G10L25/06 G10L25/90

    Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

    System and method for generating expressive prosody for speech synthesis

    公开(公告)号:US20190172443A1

    公开(公告)日:2019-06-06

    申请号:US15832793

    申请日:2017-12-06

    CPC classification number: G10L13/10 G10L13/027 G10L13/033 G10L13/047 G10L25/30

    Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an input text.

    System and method for generating expressive prosody for speech synthesis

    公开(公告)号:US10418025B2

    公开(公告)日:2019-09-17

    申请号:US15832793

    申请日:2017-12-06

    Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an input text.

    WIDEBAND SPEECH PARAMETERIZATION FOR HIGH QUALITY SYNTHESIS, TRANSFORMATION AND QUANTIZATION
    8.
    发明申请
    WIDEBAND SPEECH PARAMETERIZATION FOR HIGH QUALITY SYNTHESIS, TRANSFORMATION AND QUANTIZATION 有权
    用于高质量合成,转换和量化的宽带语音参数

    公开(公告)号:US20150095035A1

    公开(公告)日:2015-04-02

    申请号:US14040765

    申请日:2013-09-30

    Inventor: Slava Shechtman

    CPC classification number: G10L19/038 G10L19/02 G10L19/093

    Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

    Abstract translation: 一种用于连续语音信号的语音参数化和编码的方法。 该方法包括将所述语音信号分成多个语音帧,并且对于多个语音帧中的每一个,通过一次谐波建模对所述语音帧进行建模以产生多个谐波模型参数,从 多个谐波模型参数,从所述语音帧中减去所估计的帧信号以产生谐波模型残差,对所述一次谐波模型残差执行至少一次二次谐波建模分析以确定至少一组二次谐波模型分量, 来自第一谐波模型残差的至少一组二次谐波模型分量,以产生谐波滤波的残余信号,以及通过合成技术的分析来处理谐波滤波的残余信号以产生码本索引和相应增益的向量。

    Wideband speech parameterization for high quality synthesis, transformation and quantization
    10.
    发明授权
    Wideband speech parameterization for high quality synthesis, transformation and quantization 有权
    宽带语音参数化,用于高质量的合成,变换和量化

    公开(公告)号:US09224402B2

    公开(公告)日:2015-12-29

    申请号:US14040765

    申请日:2013-09-30

    Inventor: Slava Shechtman

    CPC classification number: G10L19/038 G10L19/02 G10L19/093

    Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

    Abstract translation: 一种用于连续语音信号的语音参数化和编码的方法。 该方法包括将所述语音信号分成多个语音帧,并且对于多个语音帧中的每一个,通过一次谐波建模对所述语音帧进行建模以产生多个谐波模型参数,从 多个谐波模型参数,从所述语音帧中减去所估计的帧信号以产生谐波模型残差,对所述一次谐波模型残差执行至少一次二次谐波建模分析以确定至少一组二次谐波模型分量, 来自第一谐波模型残差的至少一组二次谐波模型分量,以产生谐波滤波的残余信号,以及通过合成技术的分析来处理谐波滤波的残余信号以产生码本索引和相应增益的向量。

Patent Agency Ranking