PHRASE-BASED END-TO-END TEXT-TO-SPEECH (TTS) SYNTHESIS

    公开(公告)号:US20230169953A1

    公开(公告)日:2023-06-01

    申请号:US17919982

    申请日:2021-03-19

    CPC classification number: G10L13/08 G10L13/04

    Abstract: The present disclosure provides methods and apparatuses for phrase-based end-to-end text-to-speech (TTS) synthesis.
    A text may be obtained. A target phrase in the text may be identified. A phrase context of the target phrase may be determined. An acoustic feature corresponding to the target phrase may be generated based at least on the target phrase and the phrase context. A speech waveform corresponding to the target phrase may be generated based on the acoustic feature.

    INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING

    公开(公告)号:US20220122580A1

    公开(公告)日:2022-04-21

    申请号:US17561895

    申请日:2021-12-24

    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

    PROVIDING EMOTION MANAGEMENT ASSISTANCE

    公开(公告)号:US20220059122A1

    公开(公告)日:2022-02-24

    申请号:US17432476

    申请日:2020-02-03

    Inventor: Chi Xiu Jian LUAN

    Abstract: A method for providing emotion management assistance is provided. Sound streams may be received. A speech conversation between a user and at least one conversation object may be detected from the sound streams. Identity of the conversation object may be identified at least according to speech of the conversation object in the speech conversation. Emotion state of at least one speech segment of the user in the speech conversation may be determined. An emotion record corresponding to the speech conversation may be generated, wherein the emotion record at least including the identity of the conversation object, at least a portion of content of the speech conversation, and the emotion state of the at least one speech segment of the user.

    AUTOMATIC DUBBING METHOD AND APPARATUS
    4.
    发明申请

    公开(公告)号:US20200058289A1

    公开(公告)日:2020-02-20

    申请号:US16342416

    申请日:2016-11-21

    Abstract: An automatic dubbing method is disclosed. The method comprises: extracting speeches of a voice from an audio portion of a media content (504); obtaining a voice print model for the extracted speeches of the voice (506); processing the extracted speeches by utilizing the voice print model to generate replacement speeches (508); and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (510).

    AUTOMATIC SONG GENERATION
    5.
    发明申请

    公开(公告)号:US20200035209A1

    公开(公告)日:2020-01-30

    申请号:US16500995

    申请日:2018-04-18

    Abstract: In accordance with implementations of the subject matter described herein, there is provided a solution for supporting a machine to automatically generate a song. In this solution, an input from a user is used to determine a creation intention of the user with respect to a song to be generated. Lyrics of the song are generated based on the creation intention. Then, a template for the song is generated based at least in part on the lyrics. The template indicates a melody matching with the lyrics. In this way, it is feasible to automatically create the melody and lyrics which not only conform to the creation intention of the user but also match with each other.

    AUTOMATIC DUBBING METHOD AND APPARATUS

    公开(公告)号:US20230076258A1

    公开(公告)日:2023-03-09

    申请号:US17985016

    申请日:2022-11-10

    Abstract: A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.

    INTENT RECOGNITION AND EMOTIONAL TEXT-TO-SPEECH LEARNING

    公开(公告)号:US20210225357A1

    公开(公告)日:2021-07-22

    申请号:US16309399

    申请日:2017-06-07

    Abstract: An example intent-recognition system comprises a processor and memory storing instructions. The instructions cause the processor to receive speech input comprising spoken words. The instructions cause the processor to generate text results based on the speech input and generate acoustic feature annotations based on the speech input. The instructions also cause the processor to apply an intent model to the text result and the acoustic feature annotations to recognize an intent based on the speech input. An example system for adapting an emotional text-to-speech model comprises a processor and memory. The memory stores instructions that cause the processor to receive training examples comprising speech input and receive labelling data comprising emotion information associated with the speech input. The instructions also cause the processor to extract audio signal vectors from the training examples and generate an emotion-adapted voice font model based on the audio signal vectors and the labelling data.

    A HIGHLY EMPATHETIC ITS PROCESSING

    公开(公告)号:US20210082396A1

    公开(公告)日:2021-03-18

    申请号:US17050153

    申请日:2019-05-13

    Abstract: The present disclosure provides a technical solution of highly empathetic TTS processing, which not only takes a semantic feature and a linguistic feature into consideration, but also assigns a sentence ID to each sentence in a training text to distinguish sentences in the training text. Such sentence IDs may be introduced as training features into a processing of training a machine learning model, so as to enable the machine learning model to learn a changing rule for the changing of acoustic codes of sentences with a context of sentence. A speech naturally changed in rhythm and tone may be output to make TTS more empathetic by performing TTS processing with the trained model. A highly empathetic audio book may be generated using the TTS processing provided herein, and an online system for generating a highly empathetic audio book may be established with the TTS processing as a core technology.

Patent Agency Ranking