TEXT-TO-SPEECH (TTS) METHOD AND DEVICE ENABLING MULTIPLE SPEAKERS TO BE SET

    公开(公告)号:US20220351714A1

    公开(公告)日:2022-11-03

    申请号:US16485776

    申请日:2019-06-07

    Abstract: Disclosed is a text-to-speech (TTS) method enabling multiple speakers to be set. The present invention sets speaker information for the multiple characters with respect to a script composed to enable utterance by the multiple characters, and utilizes metadata including the speaker information corresponding to the multiple characters for speech synthesis, thereby realizing an audiobook which allows the multiple speakers to output speech utterance. In addition, the speaker information for the multiple characters may be set through Artificial Intelligence (AI) processing to thereby perform multi-speaker speech synthesis by a TTS device including an AI module.

    GATHERING USER'S SPEECH SAMPLES
    2.
    发明申请

    公开(公告)号:US20210134301A1

    公开(公告)日:2021-05-06

    申请号:US17028527

    申请日:2020-09-22

    Abstract: Disclosed is gathering a user's speech samples. According to an embodiment of the disclosure, a method of gathering learning samples may gather a speaker's speech data obtained while talking on a mobile terminal and text data generated from the speech data and gather training data for generating a speech synthesis model. According to the disclosure, the method of gathering learning samples may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

    SPEECH SYNTHESIS METHOD BASED ON EMOTION INFORMATION AND APPARATUS THEREFOR

    公开(公告)号:US20200035216A1

    公开(公告)日:2020-01-30

    申请号:US16593404

    申请日:2019-10-04

    Abstract: A speech synthesis method and apparatus based on emotion information are disclosed. A method for performing, by a speech synthesis apparatus, speech synthesis based on emotion information according to an embodiment of the present disclosure includes: receiving data; generating emotion information on the basis of the data; generating metadata corresponding to the emotion information; and transmitting the metadata to a speech synthesis engine, wherein the metadata is described in the form of a markup language, and the markup language includes a speech synthesis markup language (SSML). According to the present disclosure, an intelligent computing device constituting a speech synthesis apparatus may be related with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.

    METHOD AND DEVICE FOR FOCUSING SOUND SOURCE

    公开(公告)号:US20210096810A1

    公开(公告)日:2021-04-01

    申请号:US16703768

    申请日:2019-12-04

    Abstract: Disclosed are a sound source focus method and device in which the sound source focus device, in a 5G communication environment by amplifying and outputting a sound source signal of a user's object of interest extracted from an acoustic signal included in video content by executing a loaded artificial intelligence (AI) algorithm and/or machine learning algorithm. The sound source focus method includes playing video content including a video signal including at least one moving object and the acoustic signal in which sound sources output by the object are mixed, determining the user's object of interest from the video signal, acquiring unique sound source information about the user's object of interest, extracting an actual sound source for the user's object of interest corresponding to the unique sound source information from the acoustic signal, and outputting the actual sound source extracted for the user's object of interest.

    MOBILE TERMINAL AND METHOD FOR CONTROLLING THE SAME
    7.
    发明申请
    MOBILE TERMINAL AND METHOD FOR CONTROLLING THE SAME 有权
    移动终端及其控制方法

    公开(公告)号:US20150169166A1

    公开(公告)日:2015-06-18

    申请号:US14538169

    申请日:2014-11-11

    Abstract: A mobile terminal and a method for controlling the same are disclosed. The mobile terminal includes: a display configured to output a specific image in the image view mode; an extractor that extracts a specific area from the specific image, based on a touch gesture on the specific image; and a controller that creates a thumbnail of the specific image from an image of the extracted specific area, and when the image preview function is executed, outputs a thumbnail list that displays the created thumbnail of the specific image to be visually distinct from the thumbnails of other images.

    Abstract translation: 公开了一种移动终端及其控制方法。 移动终端包括:被配置为以图像视图模式输出特定图像的显示器; 提取器,其基于所述特定图像上的触摸手势,从所述特定图像提取特定区域; 以及控制器,其从所提取的特定区域的图像创建特定图像的缩略图,并且当执行图像预览功能时,输出显示特定图像的创建的缩略图的缩略图列表,以与视觉上不同的缩略图 其他图像。

    VOICE SYNTHESIS DEVICE
    9.
    发明申请

    公开(公告)号:US20200074981A1

    公开(公告)日:2020-03-05

    申请号:US16547323

    申请日:2019-08-21

    Abstract: Disclosed is a voice synthesis device. The voice synthesis device includes a database configured to store a voice and a text corresponding to the voice and a processor configured to extract characteristic information and a tone of a first-language voice stored in the database, classify an utterance style of an utterer on basis of the extracted characteristic information, generate utterer analysis information including the utterance style and the tone, translate a text corresponding to the first-language voice into a second language, and synthesize the text, translated into the second language, in a second-language voice by using the utterer analysis information.

    ARTIFICIAL INTELLIGENCE APPARATUS FOR CORRECTING SYNTHESIZED SPEECH AND METHOD THEREOF

    公开(公告)号:US20200058290A1

    公开(公告)日:2020-02-20

    申请号:US16660947

    申请日:2019-10-23

    Abstract: Disclosed herein is an artificial intelligence apparatus includes a memory configured to store learning target text and human speech of a person who pronounces the text, a processor configured to generate synthesized speech in which the text is pronounced by synthesized sound and extract a synthesized speech feature set including information on a feature pronounced in the synthesized speech and a human speech feature set including information on a feature pronounced in the human speech, and a learning processor configured to train a speech correction model for outputting a corrected speech feature set to allow predetermined synthesized speech to be corrected based on a human pronunciation feature when a synthesized speech feature set extracted from predetermined synthesized speech is input, based on the synthesized speech feature set and the human speech feature set.

Patent Agency Ranking