-
公开(公告)号:US20220148562A1
公开(公告)日:2022-05-12
申请号:US17554547
申请日:2021-12-17
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sangjun PARK , Kyoungbo Min , Kihyun Choo , Seungdo Choi
IPC: G10L13/047 , G10L13/10
Abstract: An electronic apparatus and a controlling method thereof are provided. The electronic apparatus includes a microphone; a memory configured to store a text-to-speech (TTS) model and a plurality of evaluation texts; and a processor configured to: obtain a first reference vector of a user speech spoken by a user based the user speech being received through the microphone, generate a plurality of candidate reference vectors based on the first reference vector, obtain a plurality of synthesized sounds by inputting the plurality of candidate reference vectors and the plurality of evaluation texts to the TTS model, identify at least one synthesized sound of the plurality of synthesized sounds based on a similarity between characteristics of the plurality of synthesized sounds and the user speech, and store a second reference vector of the at least one synthesized sound in the memory as a reference vector corresponding to the user for the TTS model.
-
公开(公告)号:US12198675B2
公开(公告)日:2025-01-14
申请号:US18171079
申请日:2023-02-17
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Hosang Sung , Kyoungbo Min , Seonho Hwang , Doohwa Hong , Eunmi Oh , Jonghoon Jeong , Kihyun Choo
Abstract: An electronic apparatus which acquires input data to be input into a TTS module for outputting a voice through the TTS module, acquires a voice signal corresponding to the input data through the TTS module, detects an error in the acquired voice signal based on the input data, corrects the input data based on the detection result, and acquires a corrected voice signal corresponding to the corrected input data through the TTS module.
-
公开(公告)号:US11830473B2
公开(公告)日:2023-11-28
申请号:US17037023
申请日:2020-09-29
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jesus Monge Alvarez , Holly Francois , Hosang Sung , Seungdo Choi , Kihyun Choo , Sangjun Park
IPC: G10L13/027 , G10L13/047 , G10L15/06 , G10L15/187 , G10L13/06
CPC classification number: G10L13/027 , G10L13/047 , G10L13/06 , G10L15/063 , G10L15/187 , G10L2015/0635
Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.
-
公开(公告)号:US09848180B2
公开(公告)日:2017-12-19
申请号:US14794517
申请日:2015-07-08
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Junghoe Kim , Eunmi Oh , Kihyun Choo , Miao Lei
CPC classification number: H04N13/161 , G10L19/008 , H04S1/007 , H04S3/008 , H04S7/308 , H04S2420/03
Abstract: Surround audio decoding for selectively generating an audio signal from a multi-channel signal. In the surround audio decoding, a down-mixed signal, e.g., as down-mixed by an encoding terminal, is selectively up-mixed to a stereo signal or a multi-channel signal, by generating spatial information for generating the stereo signal, using spatial information for up-mixing the down-mixed signal to the multi-channel signal.
-
公开(公告)号:US12154563B2
公开(公告)日:2024-11-26
申请号:US17679446
申请日:2022-02-24
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jonghoon Jeong , Hosang Sung , Doohwa Hong , Kyoungbo Min , Eunmi Oh , Kihyun Choo
Abstract: An electronic apparatus, based on a text sentence being input, obtains prosody information of the text sentence, segments the text sentence into a plurality of sentence elements, obtains a speech in which prosody information is reflected to each of the plurality of sentence elements in parallel by inputting the plurality of sentence elements and the prosody information of the text sentence to a text to speech (TTS) module, and merges the speech for the plurality of sentence elements that are obtained in parallel to output speech for the text sentence.
-
公开(公告)号:US11335325B2
公开(公告)日:2022-05-17
申请号:US16749257
申请日:2020-01-22
Applicant: Samsung Electronics Co., Ltd.
Inventor: Hosang Sung , Seonho Hwang , Doohwa Hong , Eunmi Oh , Kyoungbo Min , Jonghoon Jeong , Kihyun Choo
IPC: G10L13/08 , G10L15/22 , G10L15/18 , G10L13/047 , G10L13/033 , G10L15/02 , G10L13/00
Abstract: An electronic device and a controlling method of the electronic device are provided. The electronic device acquires text to respond on a received user's speech, acquires a plurality of pieces of parameter information for determining a style of an output speech corresponding to the text based on information on a type of a plurality of text-to-speech (TTS) databases and the received user's speech, identifies a TTS database corresponding to the plurality of pieces of parameter information among the plurality of TTS databases, identifies a weight set corresponding to the plurality of pieces of parameter information among a plurality of weight sets acquired through a trained artificial intelligence model, adjusts information on the output speech stored in the TTS database based on the weight set, synthesizes the output speech based on the adjusted information on the output speech, and outputs the output speech corresponding to the text.
-
公开(公告)号:US20210225358A1
公开(公告)日:2021-07-22
申请号:US17037023
申请日:2020-09-29
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Jesus MONGE ALVAREZ , Holly Francois , Hosang Sung , Seungdo Choi , Kihyun Choo , Sangjun Park
IPC: G10L13/027 , G10L13/047 , G10L13/06 , G10L15/06 , G10L15/187
Abstract: A system for synthesising expressive speech includes: an interface configured to receive an input text for conversion to speech; a memory; and at least one processor coupled to the memory. The processor is configured to generate, using an expressivity characterisation module, a plurality of expression vectors, wherein each expression vector is a representation of prosodic information in a reference audio style file, and synthesise expressive speech from the input text, using an expressive acoustic model comprising a deep convolutional neural network that is conditioned by at least one of the plurality of expression vectors.
-
公开(公告)号:US09479871B2
公开(公告)日:2016-10-25
申请号:US14134508
申请日:2013-12-19
Applicant: Samsung Electronics Co., Ltd.
Inventor: Junghoe Kim , Eunmi Oh , Kihyun Choo , Miao Lei
CPC classification number: H04R5/02 , G10L19/008 , H04R5/033 , H04S1/002 , H04S3/00 , H04S3/002 , H04S3/02 , H04S2420/01 , H04S2420/07
Abstract: A method, medium, and system generating a 3-dimensional (3D) stereo signal in a decoder by using a surround data stream. According to such a method, medium, and system, a head related transfer function (HRTF) is applied in a quadrature mirror filter (QMF) domain, thereby generating a 3D stereo signal by using a surround data stream.
-
公开(公告)号:US11848004B2
公开(公告)日:2023-12-19
申请号:US17850096
申请日:2022-06-27
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sangjun Park , Kihyun Choo
CPC classification number: G10L13/10 , G10L13/047 , G10L13/06
Abstract: A method for controlling an electronic device includes obtaining a text, obtaining, by inputting the text into a first neural network model, acoustic feature information corresponding to the text and alignment information in which each frame of the acoustic feature information is matched with each phoneme included in the text, identifying an utterance speed of the acoustic feature information based on the alignment information, identifying a reference utterance speed for each phoneme included in the acoustic feature information based on the text and the acoustic feature information, obtaining utterance speed adjustment information based on the utterance speed of the acoustic feature information and the reference utterance speed for each phoneme, and obtaining, based on the utterance speed adjustment information, speech data corresponding to the text by inputting the acoustic feature information into a second neural network model.
-
公开(公告)号:US11763799B2
公开(公告)日:2023-09-19
申请号:US17554547
申请日:2021-12-17
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Sangjun Park , Kyoungbo Min , Kihyun Choo , Seungdo Choi
IPC: G10L13/08 , G10L15/14 , G10L15/06 , G10L13/047 , G10L13/10
CPC classification number: G10L13/047 , G10L13/10
Abstract: An electronic apparatus and a controlling method thereof are provided. The electronic apparatus includes a microphone; a memory configured to store a text-to-speech (TTS) model and a plurality of evaluation texts; and a processor configured to: obtain a first reference vector of a user speech spoken by a user based the user speech being received through the microphone, generate a plurality of candidate reference vectors based on the first reference vector, obtain a plurality of synthesized sounds by inputting the plurality of candidate reference vectors and the plurality of evaluation texts to the TTS model, identify at least one synthesized sound of the plurality of synthesized sounds based on a similarity between characteristics of the plurality of synthesized sounds and the user speech, and store a second reference vector of the at least one synthesized sound in the memory as a reference vector corresponding to the user for the TTS model.
-
-
-
-
-
-
-
-
-