Patent search ap:("QUALCOMM Incorporated") AND inv:"Sunkuk MOON" Page 1

1.

发明申请
SYNTHESIZED SPEECH GENERATION 有权

公开(公告)号：US20220230623A1

公开(公告)日：2022-07-21

申请号：US17154372

申请日：2021-01-21

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen BYUN , Sunkuk MOON , Shuhua ZHANG , Vahid MONTAZERI , Lae-Hoon KIM , Erik VISSER

IPC: G10L13/047 , G06N3/04 , G10L13/033 , G10L25/63 , G10L19/02

Abstract: A device for speech generation includes one or more processors configured to receive one or more control parameters indicating target speech characteristics. The one or more processors are also configured to process, using a multi-encoder, an input representation of speech based on the one or more control parameters to generate encoded data corresponding to an audio signal that represents a version of the speech based on the target speech characteristics.

2.

发明公开
CONTEXT-BASED SPEECH ENHANCEMENT 审中-公开

公开(公告)号：US20230326477A1

公开(公告)日：2023-10-12

申请号：US18334641

申请日：2023-06-14

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen BYUN , Shuhua ZHANG , Lae-Hoon KIM , Erik VISSER , Sunkuk MOON , Vahid MONTAZERI

IPC: G10L21/0232 , G10L21/038 , G10L21/02

CPC classification number: G10L21/0232 , G10L21/038 , G10L21/02

Abstract: A device to perform speech enhancement includes one or more processors configured to process image data to detect at least one of an emotion, a speaker characteristic, or a noise type. The one or more processors are also configured to generate context data based at least in part on the at least one of the emotion, the speaker characteristic, or the noise type. The one or more processors are further configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and the context data to generate output spectral data that represents a speech enhanced version of the input signal.

3.

发明申请
USER SPEECH PROFILE MANAGEMENT 有权

公开(公告)号：US20220180859A1

公开(公告)日：2022-06-09

申请号：US17115158

申请日：2020-12-08

Applicant: QUALCOMM Incorporated

Inventor： Soo Jin PARK , Sunkuk MOON , Lae-Hoon KIM , Erik VISSER

IPC: G10L15/07 , G10L15/16 , G10L15/04 , G06F1/3231

Abstract: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

4.

发明公开
SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC 审中-公开

公开(公告)号：US20240087597A1

公开(公告)日：2024-03-14

申请号：US17931755

申请日：2022-09-13

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen BYUN , Sunkuk MOON , Erik VISSER

IPC: G10L25/63 , G10L25/21

CPC classification number: G10L25/63 , G10L25/21

Abstract: A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.

5.

发明申请
CONTROLLABLE DIFFUSION-BASED SPEECH GENERATIVE MODEL 有权

公开(公告)号：US20250078810A1

公开(公告)日：2025-03-06

申请号：US18494640

申请日：2023-10-25

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen BYUN , Sunkuk MOON , Erik VISSER

IPC: G10L13/10 , G10L13/027

Abstract: Systems and techniques described herein relate to a diffusion-based model for generating converted speech from a source speech based on target speech. For example, a device may extract first prosody data from input data and may generate a content embedding based on the input data. The device may extract second prosody data from target speech, generate a speaker embedding from the target speech, and generate a prosody embedding from the second prosody data. The device may generate, based on the first prosody data and the prosody embedding, converted prosody data. The device may then generate a converted spectrogram based on the converted prosody data, the speaker embedding, and the content embedding.

6.

发明申请
ACTIVITY QUERY RESPONSE SYSTEM 有权

公开(公告)号：US20210011887A1

公开(公告)日：2021-01-14

申请号：US16586821

申请日：2019-09-27

Applicant: QUALCOMM Incorporated

Inventor： Erik VISSER , Rehana MAHFUZ , Ravi CHOUDHARY , Lae-Hoon KIM , Sunkuk MOON , Yinyi GUO , Fatemeh SAKI

IPC: G06F16/18 , G06F17/28 , G06N3/04 , G06F16/61 , A61B5/11

Abstract: A device for activity tracking includes a memory and one or more processors. The memory is configured to store an activity log. The one or more processors are configured to update the activity log based on activity data. The activity data is received from a second device. The one or more processors are also configured to, responsive to receiving a natural language query, generate a query response based on the activity log.

7.

发明申请
SPEAKER VERIFICATION BASED ON A SPEAKER TEMPLATE 审中-公开

公开(公告)号：US20200286491A1

公开(公告)日：2020-09-10

申请号：US16296733

申请日：2019-03-08

Applicant: QUALCOMM Incorporated

Inventor： Sunkuk MOON , Bicheng JIANG , Erik VISSER

IPC: G10L17/22 , G10L17/00 , G10L17/02 , G10L17/04 , G10L17/06

Abstract: A device includes a processor configured to determine a feature vector based on an utterance and to determine a first embedding vector by processing the feature vector using a trained embedding network. The processor is configured to determine a first distance metric based on distances between the first embedding vector and each embedding vector of a speaker template. The processor is configured to determine, based on the first distance metric, that the utterance is verified to be from a particular user. The processor is configured to, based on a comparison of a first particular distance metric associated with the first embedding vector to a second distance metric associated with a first test embedding vector of the speaker template, generate an updated speaker template by adding the first embedding vector as a second test embedding vector and removing the first test embedding vector from test embedding vectors of the speaker template.

8.

发明公开
SHARED SPEECH PROCESSING NETWORK FOR MULTIPLE SPEECH APPLICATIONS 审中-公开

公开(公告)号：US20230300527A1

公开(公告)日：2023-09-21

申请号：US18324622

申请日：2023-05-26

Applicant: QUALCOMM Incorporated

Inventor： Lae-Hoon KIM , Sunkuk MOON , Erik VISSER , Prajakt KULKARNI

IPC: H04R3/00 , G10L21/02 , H04R5/04 , G06N20/00 , H04L65/60 , H04L65/80 , G06F18/21 , G06V10/82 , G06V20/20

CPC classification number: H04R3/005 , G10L21/02 , H04R5/04 , G06N20/00 , H04L65/60 , H04L65/80 , G06F18/217 , G06V10/82 , G06V20/20 , H04R2499/13 , H04R2420/07

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

9.

发明申请
CONTEXT-BASED SPEECH ENHANCEMENT 有权

公开(公告)号：US20220310108A1

公开(公告)日：2022-09-29

申请号：US17209621

申请日：2021-03-23

Applicant: QUALCOMM Incorporated

Inventor： Kyungguen BYUN , Shuhua ZHANG , Lae-Hoon KIM , Erik VISSER , Sunkuk MOON , Vahid MONTAZERI

IPC: G10L21/038

Abstract: A device to perform speech enhancement includes one or more processors configured to obtain input spectral data based on an input signal. The input signal represents sound that includes speech. The one or more processors are also configured to process, using a multi-encoder transformer, the input spectral data and context data to generate output spectral data that represents a speech enhanced version of the input signal.

10.

发明申请
SHARED SPEECH PROCESSING NETWORK FOR MULTIPLE SPEECH APPLICATIONS 有权

公开(公告)号：US20220165285A1

公开(公告)日：2022-05-26

申请号：US17650595

申请日：2022-02-10

Applicant: QUALCOMM Incorporated

Inventor： Lae-Hoon KIM , Sunkuk MOON , Erik VISSER , Prajakt KULKARNI

IPC: G10L21/02 , H04R5/04 , H04R3/00 , G06N20/00 , H04L65/60 , H04L65/80 , G06K9/62

Abstract: A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification