LEARNING SINGING FROM SPEECH
    12.
    发明申请

    公开(公告)号:US20220343904A1

    公开(公告)日:2022-10-27

    申请号:US17861716

    申请日:2022-07-11

    摘要: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

    Learnable speed control for speech synthesis

    公开(公告)号:US11302301B2

    公开(公告)日:2022-04-12

    申请号:US16807801

    申请日:2020-03-03

    发明人: Chengzhu Yu Dong Yu

    摘要: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

    Multi-band synchronized neural vocoder

    公开(公告)号:US11295751B2

    公开(公告)日:2022-04-05

    申请号:US16576943

    申请日:2019-09-20

    IPC分类号: G10L19/00 G10L19/16 G06N3/02

    摘要: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

    Unsupervised automatic speech recognition

    公开(公告)号:US11138966B2

    公开(公告)日:2021-10-05

    申请号:US16269951

    申请日:2019-02-07

    摘要: A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.

    UNSUPERVISED AUTOMATIC SPEECH RECOGNITION
    19.
    发明申请

    公开(公告)号:US20200258497A1

    公开(公告)日:2020-08-13

    申请号:US16269951

    申请日:2019-02-07

    摘要: A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.