-
公开(公告)号:US11295730B1
公开(公告)日:2022-04-05
申请号:US16529689
申请日:2019-08-01
Applicant: SoundHound, Inc.
Inventor: Keyvan Mohajer , Christopher Wilson , Bernard Mont-Reynaud
Abstract: A method is described that includes processing text and speech from an input utterance using local overrides of default dictionary pronunciations. Applying this method, a word-level grammar used to process the tokens specifies at least one local word phonetic variant that applies within a specific production rule and, within a local context of the specific production rule, the local word phonetic variant overrides one or more default dictionary phonetic versions of the word. This method can be applied to parsing utterances where the pronunciation of some words depends on their syntactic or semantic context.
-
公开(公告)号:US20220076678A1
公开(公告)日:2022-03-10
申请号:US17531371
申请日:2021-11-19
Applicant: SoundHound, Inc.
Inventor: Irina A. SPIRIDONOVA , Karl STAHL , Mara SELVAGGI
Abstract: A computer-implemented method is provided. The method includes receiving commands to store memos, identifying subjects related to the memos, storing, in a database, the memos, their related subjects, and associated time information, receiving a natural language request to retrieve a memo, the request having query information, identifying a subject related to the request, responsive to the request, querying the database for memos related to the subject, identifying multiple memos in response to the database query, identifying a memo, from the multiple identified memos, that has the most recent associated time information and providing a response in dependence on the identified memo.
-
公开(公告)号:US11263198B2
公开(公告)日:2022-03-01
申请号:US16561020
申请日:2019-09-05
Applicant: SoundHound, Inc.
Inventor: Olivia Bettaglio , Pranav Singh
IPC: G06F16/23 , G06F16/2452 , G06N7/00
Abstract: Systems and methods are provided for systematically finding and fixing automatic speech recognition (ASR) mistranscriptions and natural language understanding (NLU) misinterpretations and labeling data for machine learning. High similarity of non-identical consecutive queries indicates ASR mistranscriptions. Consecutive queries with close vectors in a semantic embedding space indicates NLU misinterpretations. Key phrases and barge-in also indicate errors. Only queries within a short amount of time are considered.
-
公开(公告)号:US20210390944A1
公开(公告)日:2021-12-16
申请号:US17341082
申请日:2021-06-07
Applicant: SoundHound, Inc.
Inventor: Andrew RICHARDS
IPC: G10L13/047 , G10L13/08 , G10L13/033 , G06F3/16 , G10L15/26 , G06N3/08 , G06N3/04
Abstract: A discriminator trained on labeled samples of speech can compute probabilities of voice properties. A speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. Voice parameters can include speaker voice parameters, accents, and attitudes, among others. Training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. A graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.
-
公开(公告)号:US20210312920A1
公开(公告)日:2021-10-07
申请号:US17301291
申请日:2021-03-30
Applicant: SoundHound, Inc.
Inventor: Karl Stahl
IPC: G10L15/22 , G10L25/06 , H04R1/08 , G10L21/0316 , G10L25/51
Abstract: A voice-controlled device includes a microphone to receive a set of sound waves that includes speech uttered by a user and other sound, and to output a first audio signal that includes a contribution from the speech uttered by the user and a contribution from the other sound. The device also includes a receiver to receive an electromagnetic signal and to output a second audio signal obtained from the electromagnetic signal. An audio pre-processor of the device processes the first audio signal using the second audio signal to reduce the contribution from the other sound in a processed audio signal. The voice-controlled device then provides the processed audio signal to a speech recognition module to determine a voice command issued by the user.
-
公开(公告)号:US20210272552A1
公开(公告)日:2021-09-02
申请号:US17325114
申请日:2021-05-19
Applicant: SoundHound, Inc.
Inventor: Kiran Garaga LOKESWARAPPA , Joel GEDALIUS , Bernard MONT-REYNAUD , Jun HUANG
IPC: G10L15/02 , H04L29/08 , G10L15/06 , G06Q30/02 , G06F40/205 , G06F40/211 , G06F40/253 , G06N20/00 , G10L15/18 , G10L25/90
Abstract: A computer-implemented method is provided. The method including receiving speech audio of dictation associated with a user ID, deriving acoustic features from the speech audio, storing the derived acoustic features in a user profile associated with the user ID, receiving a request for acoustic features through an application programming interface (API), the request including the user ID, and sending the derived acoustic features through the API.
-
公开(公告)号:US20210210099A1
公开(公告)日:2021-07-08
申请号:US16735677
申请日:2020-01-06
Applicant: SoundHound, Inc.
Inventor: Arvinderpal S. Wander , Evelyn Jiang , Matthias Eichstaedt , Timothy Calhoun
Abstract: A method and system for responding to multiple voice requests sent from a group of devices in substantive response to a single spoken utterance of a user. In one embodiment, if the devices have a same group ID, a server determines if any of the group of received voice requests are duplicate. In one embodiment, voice requests received within a predetermined time window are examined to determine if they are duplicate. If so, the server deems one of the received voice requests as non-duplicate and the others as duplicate and sends a substantive response for the non-duplicate voice request. In some embodiments, a no-op is sent to the devices that do not receive the substantive response.
-
公开(公告)号:US20210174806A1
公开(公告)日:2021-06-10
申请号:US16703783
申请日:2019-12-04
Applicant: SoundHound, Inc.
Inventor: Sudharsan Krishnaswamy , Maisy Wieman , Jonah Probell
Abstract: A neural speech-to-meaning system is trained on speech audio expressing specific intents. The system receives speech audio and produces indications of when the speech in the audio matches the intent. Intents may include variables that can have a large range of values, such as the names of places. The neural speech-to-meaning system simultaneously recognizes enumerated values of variables and general intents. Recognized variable values can serve as arguments to API requests made in response to recognized intents. Accordingly, neural speech-to-meaning supports voice virtual assistants that serve users based on API hits.
-
公开(公告)号:US11030993B2
公开(公告)日:2021-06-08
申请号:US16388753
申请日:2019-04-18
Applicant: SoundHound, Inc.
Inventor: Jun Huang , Kiran Garaga Lokeswarappa , Joel Gedalius , Bernard Mont-Reynaud
IPC: G10L15/00 , G10L15/02 , H04L29/08 , G10L15/06 , G06Q30/02 , G06F40/205 , G06F40/211 , G06F40/253 , G06N20/00 , G10L15/18 , G10L25/90 , G10L15/22 , G10L25/51 , G10L15/26
Abstract: A method is provided for advertisement selection. The method includes recognizing words from user speech over a large number of interactions, computing a number of unique words uttered during the interactions, classifying the user by the number of unique words uttered during the interactions, and selecting an advertisement targeted to the classified users.
-
公开(公告)号:US11003426B1
公开(公告)日:2021-05-11
申请号:US16786991
申请日:2020-02-10
Applicant: SoundHound, Inc.
Inventor: Christopher S. Wilson , Keyvan Mohajer
IPC: G06F8/41 , G06F40/211
Abstract: A command-processing server provides natural language processing services to applications. The command-processing server stores a set of code blocks, each code block being able to interpret a set of corresponding natural language expressions. The command-processing server accepts natural language expressions and identifies the code blocks that are capable of interpreting those expressions by attempting to parse the natural language expressions using the code blocks. The command-processing server then provides a list of the identified code blocks to the developers, who can then incorporate the code blocks into their applications.
-
-
-
-
-
-
-
-
-