METHOD FOR SYNTHESIZED SPEECH GENERATION USING EMOTION INFORMATION CORRECTION AND APPARATUS

    公开(公告)号:US20210074261A1

    公开(公告)日:2021-03-11

    申请号:US16928815

    申请日:2020-07-14

    Abstract: A method includes generating first synthesized speech by using text and a first emotion vector configured for the text, extracting a second emotion vector included in the first synthesized speech, determining whether correction of the second emotion information vector is needed by comparing a loss value calculated by using the first emotion information vector and the second emotion information vector with a preconfigured threshold, re-performing speech synthesis by using a third emotion information vector generated by correcting the second emotion information vector, and outputting the generated synthesized speech, thereby configuring emotion information of speech in a more effective manner. A speech synthesis apparatus may be associated with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.

    GATHERING USER'S SPEECH SAMPLES
    3.
    发明申请

    公开(公告)号:US20210134301A1

    公开(公告)日:2021-05-06

    申请号:US17028527

    申请日:2020-09-22

    Abstract: Disclosed is gathering a user's speech samples. According to an embodiment of the disclosure, a method of gathering learning samples may gather a speaker's speech data obtained while talking on a mobile terminal and text data generated from the speech data and gather training data for generating a speech synthesis model. According to the disclosure, the method of gathering learning samples may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

    SPEECH SYNTHESIS METHOD BASED ON EMOTION INFORMATION AND APPARATUS THEREFOR

    公开(公告)号:US20200035216A1

    公开(公告)日:2020-01-30

    申请号:US16593404

    申请日:2019-10-04

    Abstract: A speech synthesis method and apparatus based on emotion information are disclosed. A method for performing, by a speech synthesis apparatus, speech synthesis based on emotion information according to an embodiment of the present disclosure includes: receiving data; generating emotion information on the basis of the data; generating metadata corresponding to the emotion information; and transmitting the metadata to a speech synthesis engine, wherein the metadata is described in the form of a markup language, and the markup language includes a speech synthesis markup language (SSML). According to the present disclosure, an intelligent computing device constituting a speech synthesis apparatus may be related with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.

    SPEECH SYNTHESIS DEVICE AND SPEECH SYNTHESIS METHOD

    公开(公告)号:US20230148275A1

    公开(公告)日:2023-05-11

    申请号:US17959050

    申请日:2022-10-03

    CPC classification number: G10L13/047 G10L25/30

    Abstract: Provided is a speech synthetic device capable of outputting a synthetic voice having various speech styles. The speech synthesis device includes a speaker, and a processor to acquire voice feature information through a text and a user input; generate a synthetic voice, by receiving the text and the voice feature information inputs into a decoder supervised-trained to minimize a difference between feature information of a learning text and characteristic information of a learning voice, and output the generated synthetic voice through the speaker.

    SPEECH SYNTHESIS METHOD AND APPARATUS BASED ON EMOTION INFORMATION

    公开(公告)号:US20200035215A1

    公开(公告)日:2020-01-30

    申请号:US16593161

    申请日:2019-10-04

    Abstract: A speech synthesis method and apparatus based on emotion information are disclosed. A speech synthesis method based on emotion information extracts speech synthesis target text from received data and determines whether the received data includes situation explanation information. First metadata corresponding to first emotion information is generated on the basis of the situation explanation information. When the extracted data does not include situation explanation information, second metadata corresponding to second emotion information generated on the basis of semantic analysis and context analysis is generated. One of the first metadata and the second metadata is added to the speech synthesis target text to synthesize speech corresponding to the extracted data. A speech synthesis apparatus of this disclosure may be associated with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.

Patent Agency Ranking