-
公开(公告)号:US10896671B1
公开(公告)日:2021-01-19
申请号:US16206963
申请日:2018-11-30
Applicant: SoundHound, Inc.
Inventor: Keyvan Mohajer , Christopher S. Wilson , Bernard Mont-Reynaud , Robert MacRae
Abstract: A command-processing server provides natural language services to applications. More specifically, the command-processing server receives natural language inputs from users for use in applications such as virtual assistants. Some user inputs create user-defined rules that consist of trigger conditions and of corresponding actions that are executed when the triggers fire. The command-processing server stores the rules received from a user in association with the specific user. The command-processing server also identifies rules that can be generalized across users and promoted into generic rules applicable to many or all users. The generic rules may or may not have an associated context constraining their application.
-
公开(公告)号:US10832287B2
公开(公告)日:2020-11-10
申请号:US16134890
申请日:2018-09-18
Applicant: SoundHound, Inc.
Inventor: Aaron Master , Keyvan Mohajer
Abstract: An audio recognition system provides for delivery of promotional content to its user. A user interface device, locally or with the assistance of a network-connected server, performs recognition of audio in response to queries. Recognition can be through a method such as processing features extracted from the audio. Audio can comprise recorded music, singing or humming, instrumental music, vocal music, spoken voice, or other recognizable types of audio. Campaign managers provide promotional content for delivery in response to audio recognized in queries.
-
公开(公告)号:US20200210529A1
公开(公告)日:2020-07-02
申请号:US16232984
申请日:2018-12-26
Applicant: SoundHound, Inc.
Inventor: Terry KONG
Abstract: A method of training word embeddings is provided. The method includes determining anchors, each comprising a first word in a first domain and a second word in a second domain, training word embeddings for the first and second domains, and training a transform for transforming word embedding vectors in the first domain to word embedding vectors in the second domain, wherein the training minimizes a loss function that includes an anchor loss for each anchor, such that for each anchor, the anchor loss is based on a distance between the anchor's second word's embedding vector and the transform of the anchor's first word's embedding vector, and for each anchor, the anchor loss for the respective anchor is zero when the distance between the respective anchor's second word's embedding vector and the transform of the respective anchor's first word's embedding vector is less than a specific tolerance.
-
公开(公告)号:US20200183815A1
公开(公告)日:2020-06-11
申请号:US16213020
申请日:2018-12-07
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud , Jonah Probell
Abstract: A virtual assistant platform provides a user interface for app developers to configure the enablement of domains for virtual assistants. Sets of test queries can be uploaded and statistical analyses displayed for the numbers of test queries served by each selected domain and costs for usage of each domain. Costs can vary according to complex pricing models. The user interface provides display views of tables, cost stack charts, and histograms to inform decisions that trade-off costs with benefits to the virtual assistant user experience. The platform interface shows, for individual queries, responses possible from different domains. Platform providers promote certain chosen domains.
-
公开(公告)号:US10586079B2
公开(公告)日:2020-03-10
申请号:US15406213
申请日:2017-01-13
Applicant: SoundHound, Inc.
Inventor: Monika Almudafar-Depeyrot , Bernard Mont-Reynaud
IPC: G10L13/033 , G10L13/10 , G06F40/30 , G10L13/00
Abstract: Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.
-
公开(公告)号:US10347245B2
公开(公告)日:2019-07-09
申请号:US15411567
申请日:2017-01-20
Applicant: SoundHound, Inc.
Inventor: Karl Stahl
Abstract: Either or both of voice speaker identification or utterance classification such as by age, gender, accent, mood, and prosody characterize speech utterances in a system that performs automatic speech recognition (ASR) and natural language processing (NLP). The characterization conditions NLP, either through application to interpretation hypotheses or to specific grammar rules. The characterization also conditions language models of ASR. Conditioning may comprise enablement and may comprise reweighting of hypotheses.
-
公开(公告)号:US10311858B1
公开(公告)日:2019-06-04
申请号:US15385493
申请日:2016-12-20
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud , Jun Huang , Kiran Garaga Lokeswarappa , Joel Gedalius
IPC: G10L15/00 , G10L15/02 , G10L15/18 , G06F17/27 , G10L15/06 , G10L25/90 , H04L29/08 , G06Q30/02 , G06N20/00
Abstract: A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.
-
公开(公告)号:US20190108257A1
公开(公告)日:2019-04-11
申请号:US15726394
申请日:2017-10-06
Applicant: SoundHound, Inc.
Inventor: Luke Lefebure , Pranav Singh
IPC: G06F17/30 , G06N7/00 , G06F17/27 , G10L15/183
Abstract: A speech recognition and natural language understanding system performs insertion, deletion, and replacement edits of tokens at positions with low probabilities according to both a forward and a backward statistical language model (SLM) to produce rewritten token sequences. Multiple rewrites can be produced with scores depending on the probabilities of tokens according to the SLMs. The rewritten token sequences can be parsed according to natural language grammars to produce further weighted scores. Token sequences can be rewritten iteratively using a graph-based search algorithm to find the best rewrite. Mappings of input token sequences to rewritten token sequences can be stored in a cache, and searching for a best rewrite can be bypassed by using cached rewrites when present. Analysis of various initial token sequences that produce the same new rewritten token sequence can be useful to improve natural language grammars.
-
公开(公告)号:US20190043493A1
公开(公告)日:2019-02-07
申请号:US15670975
申请日:2017-08-07
Applicant: SoundHound, Inc.
Inventor: Kamyar Mohajer , Robert Macrae
CPC classification number: G10L15/22 , G06F16/22 , G06F16/243 , G06F16/2457 , G06F16/24575 , G10L15/1822 , G10L15/30 , G10L17/005 , G10L2015/223
Abstract: Systems parse natural language expressions to extract items and values of their attributes and store them in a database. Systems also parse natural language expressions to extract values of attributes of user preferences and store them in a database. Recommendation engines use the databases to make recommendations. Parsing is of speech or text and uses conversation state, discussion context, synonym recognition, and speaker profile. Database pointers represent relative attribute values. Recommendations use machine learning to crowdsource from databases of many user preferences and to overcome the cold start problem. Parsing and recommendations use current or stored values of environmental parameters. Databases store different values of the same user preference attributes for different activities. Systems add unrecognized attributes and legal values when encountered in natural language expressions.
-
公开(公告)号:US20190012311A1
公开(公告)日:2019-01-10
申请号:US16128227
申请日:2018-09-11
Applicant: SoundHound, Inc.
Inventor: Pranav Singh , Keyvan Mohajer , Kamyar Mohajer , Bernard Mont-Reynaud
Abstract: A platform provides for developers of applications, such as devices, with natural language interfaces to configure the availability of vertical domain modules in applications. Modules can include grammars for parsing natural language expressions and interfaces to data sources. Third party developers can create modules with pricing models for their usage or access to their data. Device developers can browse or search available modules and test their performance for specific queries. The platform provides for devices users to access the chosen modules as configured by device developers and for charging and payment between users, application developers, and module developers.
-
-
-
-
-
-
-
-
-