-
公开(公告)号:US11030993B2
公开(公告)日:2021-06-08
申请号:US16388753
申请日:2019-04-18
Applicant: SoundHound, Inc.
Inventor: Jun Huang , Kiran Garaga Lokeswarappa , Joel Gedalius , Bernard Mont-Reynaud
IPC: G10L15/00 , G10L15/02 , H04L29/08 , G10L15/06 , G06Q30/02 , G06F40/205 , G06F40/211 , G06F40/253 , G06N20/00 , G10L15/18 , G10L25/90 , G10L15/22 , G10L25/51 , G10L15/26
Abstract: A method is provided for advertisement selection. The method includes recognizing words from user speech over a large number of interactions, computing a number of unique words uttered during the interactions, classifying the user by the number of unique words uttered during the interactions, and selecting an advertisement targeted to the classified users.
-
2.
公开(公告)号:US09564123B1
公开(公告)日:2017-02-07
申请号:US14704833
申请日:2015-05-05
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud , Jun Huang , Kiran Garaga Lokeswarappa , Joel Gedalius
CPC classification number: H04L67/306 , G06F17/271 , G06Q30/0277 , G10L15/063 , G10L15/07 , G10L15/22 , G10L25/51
Abstract: A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.
Abstract translation: 提供了一种系统和方法,用于通过分析用户的语音将用户表征信息添加到用户简档。 用户属性如年龄,性别,口音和英语水平可以通过从用户语音中提取和导出特征来推断,而用户不必手动配置此类信息。 接收音频信号作为输入的特征提取模块提取声,语音,文本,语言和语义特征。 模块可以是独立于任何特定垂直应用的系统组件,或者可以嵌入在接受语音输入并执行自然语言理解的应用中。 简档生成模块接收由特征提取模块提取的特征,并使用分类器基于提取和导出的特征来确定用户属性值,并将这些值存储在用户简档中。 所得到的概要文件变量可能全局可用于其他应用程序。
-
公开(公告)号:US11295732B2
公开(公告)日:2022-04-05
申请号:US16529730
申请日:2019-08-01
Applicant: SoundHound, Inc.
Inventor: Steffen Holm , Terry Kong , Kiran Garaga Lokeswarappa
IPC: G10L15/197 , G10L15/02 , G10L15/22 , G10L15/16 , G10L15/18
Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.
-
公开(公告)号:US10319250B2
公开(公告)日:2019-06-11
申请号:US15439883
申请日:2017-02-22
Applicant: SoundHound, Inc.
Inventor: Kiran Garaga Lokeswarappa , Jonah Probell
Abstract: Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training. Degrees of pronunciation error are computed and indicated to the user in a visual transcription or audibly as word emphasis in parroted speech. Systems can use sets of phonemes extended beyond those generally recognized for a language. Speakers are classified in order to choose specific phonetic dictionaries or adapt global ones. User profiles maintain lists of which pronunciations are preferred among ones acceptable for words with multiple recognized pronunciations. Systems use multiple correlations of word preferences across users to predict use preferences of unlisted words. Speaker-preferred pronunciations are used to weight the scores of transcription hypotheses based on phoneme sequence hypotheses in speech engines.
-
公开(公告)号:US12175964B2
公开(公告)日:2024-12-24
申请号:US17325114
申请日:2021-05-19
Applicant: SoundHound, Inc.
Inventor: Kiran Garaga Lokeswarappa , Joel Gedalius , Bernard Mont-Reynaud , Jun Huang
IPC: G10L15/00 , G06F40/205 , G06F40/211 , G06F40/253 , G06N20/00 , G06Q30/0241 , G06Q30/0251 , G10L15/02 , G10L15/06 , G10L15/18 , G10L25/90 , H04L67/306 , G10L15/22 , G10L15/26 , G10L25/51
Abstract: A computer-implemented method is provided. The method including receiving speech audio of dictation associated with a user ID, deriving acoustic features from the speech audio, storing the derived acoustic features in a user profile associated with the user ID, receiving a request for acoustic features through an application programming interface (API), the request including the user ID, and sending the derived acoustic features through the API.
-
公开(公告)号:US20210035569A1
公开(公告)日:2021-02-04
申请号:US16529730
申请日:2019-08-01
Applicant: SoundHound, Inc.
Inventor: Steffen Holm , Terry Kong , Kiran Garaga Lokeswarappa
IPC: G10L15/197 , G10L15/02 , G10L15/18 , G10L15/22 , G10L15/16
Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.
-
公开(公告)号:US10311858B1
公开(公告)日:2019-06-04
申请号:US15385493
申请日:2016-12-20
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud , Jun Huang , Kiran Garaga Lokeswarappa , Joel Gedalius
IPC: G10L15/00 , G10L15/02 , G10L15/18 , G06F17/27 , G10L15/06 , G10L25/90 , H04L29/08 , G06Q30/02 , G06N20/00
Abstract: A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.
-
公开(公告)号:US20180190269A1
公开(公告)日:2018-07-05
申请号:US15439883
申请日:2017-02-22
Applicant: SoundHound, Inc.
Inventor: Kiran Garaga Lokeswarappa , Jonah Probell
IPC: G10L15/187 , G10L15/26 , G10L15/01 , G10L13/10 , G10L15/06
CPC classification number: G10L15/26 , G09B5/04 , G09B19/06 , G10L13/00 , G10L2015/225
Abstract: Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training. Degrees of pronunciation error are computed and indicated to the user in a visual transcription or audibly as word emphasis in parroted speech. Systems can use sets of phonemes extended beyond those generally recognized for a language. Speakers are classified in order to choose specific phonetic dictionaries or adapt global ones. User profiles maintain lists of which pronunciations are preferred among ones acceptable for words with multiple recognized pronunciations. Systems use multiple correlations of word preferences across users to predict use preferences of unlisted words. Speaker-preferred pronunciations are used to weight the scores of transcription hypotheses based on phoneme sequence hypotheses in speech engines.
-
-
-
-
-
-
-