-
公开(公告)号:US11736769B2
公开(公告)日:2023-08-22
申请号:US17228438
申请日:2021-04-12
Applicant: SoundHound, Inc.
Inventor: Thor S. Khov , Terry Kong
IPC: H04N21/454 , H04N21/44 , H04N21/466 , G06V20/40 , G06N3/045
CPC classification number: H04N21/4542 , G06N3/045 , G06V20/46 , H04N21/44008 , H04N21/4665
Abstract: Various approaches relate to user defined content filtering in media playing devices of undesirable content represented in stored and real-time content from content providers. For example, video, image, and/or audio data can be analyzed to identify and classify content included in the data using various classification models and object and text recognition approaches. Thereafter, the identification and classification can be used to control presentation and/or access to the content and/or portions of the content. For example, based on the classification, portions of the content can be modified (e.g., replaced, removed, degraded, etc.) using one or more techniques (e.g., media replacement, media removal, media degradation, etc.) and then presented.
-
公开(公告)号:US20230254631A1
公开(公告)日:2023-08-10
申请号:US18194885
申请日:2023-04-03
Applicant: SoundHound, Inc.
Inventor: Karl Stahl
CPC classification number: H04R1/1083 , G10L15/22 , G10L21/0316 , G10L25/06 , G10L25/51 , H04R1/08 , H04R2420/07 , H04R5/0335
Abstract: A voice-controlled device includes a microphone to receive a set of sound waves that includes speech uttered by a user and other sound, and to output a first audio signal that includes a contribution from the speech uttered by the user and a contribution from the other sound. The device also includes a receiver to receive an electromagnetic signal and to output a second audio signal obtained from the electromagnetic signal. An audio pre-processor of the device processes the first audio signal using the second audio signal to reduce the contribution from the other sound in a processed audio signal. The voice-controlled device then provides the processed audio signal to a speech recognition module to determine a voice command issued by the user.
-
公开(公告)号:US20230245649A1
公开(公告)日:2023-08-03
申请号:US17649810
申请日:2022-02-03
Applicant: SoundHound, Inc.
Inventor: Pranav SINGH , Saraswati MISHRA , Eunjee NA
CPC classification number: G10L15/1815 , G10L15/02 , G10L15/26 , G10L2015/025
Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.
-
公开(公告)号:US20230082955A1
公开(公告)日:2023-03-16
申请号:US17447823
申请日:2021-09-16
Applicant: SoundHound, Inc.
Inventor: Timothy P. STONEHOCKER , Zizu GOWAYYED , Matthias EICHSTAEDT , Seyed Majid EMAMI , Evelyn JIANG , Ryan BERRYHILL , Mathieu RAMONA , Neil VEIRA
Abstract: A system for performing automated speech recognition (ASR) on audio data includes a queue manager to receive a request to perform ASR on audio data, add the request to a queue of incoming requests, and determine a queue depth representing a number of requests in the queue at a given time. The system also includes a load supervisor to receive the request and the queue depth from the queue manager and assign a service level for the request based on the queue depth. In addition, the system includes a speech-to-text converter to receive the assigned service level for the request from the load supervisor, select an ASR model for the request based on the received service level, receive the audio data associated with the request, and perform ASR on the audio data using the selected ASR model.
-
公开(公告)号:US11589184B1
公开(公告)日:2023-02-21
申请号:US17655650
申请日:2022-03-21
Applicant: SoundHound, Inc.
Inventor: Bernard Mont-Reynaud
IPC: H04S7/00
Abstract: Methods and systems for intuitive spatial audio rendering with improved intelligibility are disclosed. By establishing a virtual association between an audio source and a location in the listener's virtual audio space, a spatial audio rendering system can generate spatial audio signals that create a natural and immersive audio field for a listener. The system can receive the virtual location of the source as a parameter and map the source audio signal to a source-specific multi-channel audio signal. In addition, the spatial audio rendering system can be interactive and dynamically modify the rendering of the spatial audio in response to a user's active control or tracked movement.
-
66.
公开(公告)号:US20220383869A1
公开(公告)日:2022-12-01
申请号:US17332927
申请日:2021-05-27
Applicant: SoundHound, Inc.
Inventor: Utku YABAS , Philipp HUBERT , Karl STAHL
IPC: G10L15/22 , G10L15/26 , G06F40/211 , G06F40/284 , G10L15/183
Abstract: A user specifies a natural language command to a device. Software on the device generates contextual metadata about the user interface of the device, such as data about all visible elements of the user interface, and sends the contextual metadata along with the natural language command to a natural language understanding engine. The natural language understanding engine parses the natural language query using a stored grammar (e.g., a grammar provided by a maker of the device) and as a result of the parsing identifies information about the command (e.g., the user interface elements referenced by the command) and provides that information to the device. The device uses that provided information to respond to the command.
-
公开(公告)号:US11367448B2
公开(公告)日:2022-06-21
申请号:US17237003
申请日:2021-04-21
Applicant: SOUNDHOUND, INC.
Inventor: Keyvan Mohajer , Mehul Patel
Abstract: A method of providing a platform for configuring device-specific speech recognition is provided. The method includes providing a user interface for developers to select a set of at least two acoustic models appropriate for a specific type of a device, receiving, from a developer, a selection of the set of the at least two acoustic models, and configuring a speech recognition system to perform device-specific speech recognition by using one acoustic model selected from the at least two acoustic models of the set.
-
公开(公告)号:US20220147510A1
公开(公告)日:2022-05-12
申请号:US17581846
申请日:2022-01-21
Applicant: SoundHound, Inc.
Inventor: Pranav Singh , Olivia Bettaglio
IPC: G06F16/23 , G06F16/2452 , G06N7/00
Abstract: Systems and methods are provided for natural language processing using neural network models and natural language virtual assistants. The system and method include receiving a natural language phrase including a word sequence, computing corresponding error probabilities that the words are errors, and for a word with a corresponding error probability above a threshold, then computing a replacement phrase with a low error probability to provide a response from the virtual assistant depending on the replacement phrase.
-
公开(公告)号:US11328721B2
公开(公告)日:2022-05-10
申请号:US16781214
申请日:2020-02-04
Applicant: SoundHound, Inc.
Inventor: Hsuan Yang , Qìndí Zhang , Warren S. Heit
Abstract: A system and method are disclosed for ignoring a wakeword received at a speech-enabled listening device when it is determined the wakeword is reproduced audio from an audio-playing device. Determination can be by detecting audio distortions, by an ignore flag sent locally between an audio-playing device and speech-enabled device, by and ignore flag sent from a server, by comparison of received audio played audio to a wakeword within an audio-playing device or a speech-enabled device, and other means.
-
公开(公告)号:US11308960B2
公开(公告)日:2022-04-19
申请号:US16824308
申请日:2020-03-19
Applicant: SoundHound, Inc.
Inventor: Patricia Pozon Aguayo , Jennifer Hee Young Zhang , Jonah Probell
Abstract: A processing system detects a period of non-voice activity and compares its duration to a cutoff period. The system adapts the cutoff period based on parsing previously-recognized speech to determine, according to a model, such as a machine-learned model, the probability that the speech recognized so far is a prefix to a longer complete utterance. The cutoff period is longer when a parse of previously recognized speech has a high probability of being a prefix of a longer utterance.
-
-
-
-
-
-
-
-
-