-
公开(公告)号:US20220230623A1
公开(公告)日:2022-07-21
申请号:US17154372
申请日:2021-01-21
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen BYUN , Sunkuk MOON , Shuhua ZHANG , Vahid MONTAZERI , Lae-Hoon KIM , Erik VISSER
IPC: G10L13/047 , G06N3/04 , G10L13/033 , G10L25/63 , G10L19/02
Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.
-
公开(公告)号:US20230326477A1
公开(公告)日:2023-10-12
申请号:US18334641
申请日:2023-06-14
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen BYUN , Shuhua ZHANG , Lae-Hoon KIM , Erik VISSER , Sunkuk MOON , Vahid MONTAZERI
IPC: G10L21/0232 , G10L21/038 , G10L21/02
CPC classification number: G10L21/0232 , G10L21/038 , G10L21/02
Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.
-
公开(公告)号:US20220310108A1
公开(公告)日:2022-09-29
申请号:US17209621
申请日:2021-03-23
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen BYUN , Shuhua ZHANG , Lae-Hoon KIM , Erik VISSER , Sunkuk MOON , Vahid MONTAZERI
IPC: G10L21/038
Abstract: A device to perform speech enhancement includes one or more processors configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and context data to generate output spectral data that represents a speech enhanced version of the input signal.
-
公开(公告)号:US20240087597A1
公开(公告)日:2024-03-14
申请号:US17931755
申请日:2022-09-13
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen BYUN , Sunkuk MOON , Erik VISSER
Abstract: A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.
-
公开(公告)号:US20250078810A1
公开(公告)日:2025-03-06
申请号:US18494640
申请日:2023-10-25
Applicant: QUALCOMM Incorporated
Inventor: Kyungguen BYUN , Sunkuk MOON , Erik VISSER
IPC: G10L13/10 , G10L13/027
Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.
-
-
-
-