Patent search ap:"SoundHound Inc." Page 2

11.

发明公开
METHOD AND APPARATUS FOR INTELLIGENT VOICE QUERY 审中-公开

公开(公告)号：US20230237056A1

公开(公告)日：2023-07-27

申请号：US17654635

申请日：2022-03-14

Applicant: SoundHound, Inc.

Inventor： Chong WANG

IPC: G06F16/2452 , G06F16/242 , G10L15/18 , G10L15/22

CPC classification number: G06F16/24522 , G06F16/2425 , G10L15/18 , G10L15/22 , G10L2015/223

Abstract: A method and an apparatus for processing an intelligent voice query. A voice query input is received from a user. Automatic speech recognition and natural language understanding generate structured query data. It is modified based on an input adaptation rule to obtain modified structured query data appropriate for a content providing server, which provides a query result output corresponding to the modified structured query data. Input adaptation rules may comprise rule sets based on behavior patterns of the user and/or business recommendations. The query result output can be used for natural language generation, which may have similar adaptation rules for output.

12.

发明申请
SPEECH-ENABLED AUGMENTED REALITY 有权

公开(公告)号：US20230055477A1

公开(公告)日：2023-02-23

申请号：US17445653

申请日：2021-08-23

Applicant: SoundHound, Inc.

Inventor： Keyvan MOHAJER , Morris MICHAEL , Bernard MONT-REYNAUD

IPC: G06K9/00 , G06T7/70 , G10L15/08 , G10L15/22 , G06T11/60 , G06F3/01 , G06K9/62

Abstract: Methods and systems for implementing an intuitive interaction between the user and the virtual content of augmented reality applications are disclosed. By implementing an augmented reality inquiry mode of a device, the system can enable a user to interact with relevant virtual objects via a speech-enabled interface. The speech-enabled augmented reality system can identify visual objects in images and recognize virtual objects corresponding to the visual objects, determine one or more relevant objects from the virtual objects based on relevance factors. Once the interaction session is established, a user can further interact with the relevant virtual objects, notably through voice commands addressed to the object. Accordingly, the present subject matter can enable a natural and hands-free interaction between the user and any virtual object that the user is interested in.

13.

发明申请
WAKEWORD SELECTION 有权

公开(公告)号：US20220223155A1

公开(公告)日：2022-07-14

申请号：US17709131

申请日：2022-03-30

Applicant: SoundHound, Inc.

Inventor： Bernard Mont-Reynaud

IPC: G10L15/22 , G06F3/16 , G10L15/06 , G10L15/08 , G10L17/04

Abstract: A system and method are disclosed capable of parsing a spoken utterance into a natural language request and a speech audio segment, where the natural language request directs the system to use the speech audio segment as a new wakeword. In response to this wakeword assignment directive, the system and method are further capable of immediately building a new wakeword spotter to activate the device upon matching the new wakeword in the input audio. Different approaches to promptly building a new wakeword spotter are described. Variations of wakeword assignment directives can make the new wakeword public or private. They can also add the new wakeword to earlier wakewords, or replace earlier wakewords.

14.

发明申请
Meaning Inference from Speech Audio 有权

公开(公告)号：US20220189464A1

公开(公告)日：2022-06-16

申请号：US17653365

申请日：2022-03-03

Applicant: SoundHound, Inc.

Inventor： Sudharsan KRISHNASWAMY , Maisy WIEMAN , Jonah PROBELL

IPC: G10L15/06 , G10L15/16 , G10L15/18 , G10L13/02 , G10L15/197 , G10L15/22 , G10L15/187

Abstract: A system and method invoke virtual assistant action, which may comprise an argument. From audio, a probability of an intent is inferred. A probability of a domain and a plurality of variable values may also be inferred. Invoking the action is in response to the intent probability exceeding a threshold. Invoking the action may also be in response to the domain probability exceeding a threshold, a variable value probability exceeding a threshold, detecting an end of utterance, and a specific amount of time having elapsed. The intent probability may increase when the audio includes speech of words with the same meaning in multiple natural languages. Invoking the action may also be conditional on the variable value exceeding its threshold within a certain period of time of the intent probability exceeding its threshold.

15.

发明申请
SYSTEM AND METHOD FOR COMPUTING REGION CENTERS BY POINT CLUSTERING 有权

公开(公告)号：US20220188580A1

公开(公告)日：2022-06-16

申请号：US17549796

申请日：2021-12-13

Applicant: SoundHound, Inc.

Inventor： Christophe PIERRET

IPC: G06K9/62

Abstract: A system and a method are disclosed that calculate the center of a geographic region. A set of topological/geographical points is received. A set of clusters is determined. A weight for each cluster is computed. The highest weighted cluster is selected. The geographic region center is calculated using the selected cluster. The geographical points can include a key for each point and be filtered by an indicated key before calculating the center of a geographic location.

16.

发明申请
RECOMMENDATION ENGINE FOR UPSELLING IN RESTAURANT ORDERS 有权

公开(公告)号：US20220165272A1

公开(公告)日：2022-05-26

申请号：US17667535

申请日：2022-02-08

Applicant: SoundHound, Inc.

Inventor： Kamyar MOHAJER , Robert MACRAE

IPC: G10L15/22 , G06F16/2457 , G10L17/00 , G10L15/18 , G10L15/30 , G06F16/242 , G06F16/22

Abstract: A computer-implemented method is provided to support a food ordering system for food items from a menu of a restaurant using natural language. Expressions made for ordering are used to recommend a food item that a user has a high probability of wanting to include in an order. The recommendation engine is trained using machine learning. Expressions are collected and parsed to identify words that might indicate food items offered by the restaurant. The words are provided to a restaurant owner to identify food items on a menu, with which the words are associated.

17.

发明申请
DRIVER INTERFACE WITH VOICE AND GESTURE CONTROL 有权

公开(公告)号：US20220139393A1

公开(公告)日：2022-05-05

申请号：US17547917

申请日：2021-12-10

Applicant: SoundHound, Inc.

Inventor： Zili Li , Cristina Vasconcelos

IPC: G10L15/22 , G10L15/02 , G10L15/30 , G10L15/18 , G10L15/187 , G10L15/24 , G10L15/06 , G06K9/62 , G10L15/16 , G06V10/40 , G06V10/70 , G06V20/40

Abstract: A driver interface for use within an automobile provides responses to voice commands issued for example by a driver of the automobile. The interface includes a camera and microphone for capturing image data such as gestures and audio data from the automobile driver. The image data and audio data are processed to extract image and linguistic features from the image and audio data, which image and linguistic features are processed to interpret and infer a meaning of the voice command.

18.

发明授权
Synthesizing speech recognition training data 有权

公开(公告)号：US11308938B2

公开(公告)日：2022-04-19

申请号：US16704216

申请日：2019-12-05

Applicant: SoundHound, Inc.

Inventor： Maisy Wieman , Jonah Probell , Sudharsan Krishnaswamy

IPC: G10L15/22 , G10L15/06 , G10L15/16 , G10L15/18 , G10L13/02 , G10L15/197 , G10L15/187

Abstract: To train a speech recognizer, such as for recognizing variables in a neural speech-to-meaning system, compute, within an embedding space, a range of vectors of features of natural speech. Generate parameter sets for speech synthesis and synthesis speech according to the parameters. Analyze the synthesized speech to compute vectors in the embedding space. Using a cost function that favors an even spread (minimal clustering) generates a multiplicity of speech synthesis parameter sets. Using the multiplicity of parameter sets, generate a multiplicity of speech of known words that can be used as training data for speech recognition.

19.

发明申请
SYSTEM AND METHOD FOR VOICE MORPHING IN A DATA ANNOTATOR TOOL 有权

公开(公告)号：US20220092273A1

公开(公告)日：2022-03-24

申请号：US17539182

申请日：2021-11-30

Applicant: SoundHound, Inc.

Inventor： Dylan H. Ross

IPC: G06F40/56 , G10L19/26 , G10L19/125 , G10L15/18 , G06F40/58

Abstract: A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.

20.

发明授权
Vision-assisted speech processing 有权

公开(公告)号：US11257493B2

公开(公告)日：2022-02-22

申请号：US16509029

申请日：2019-07-11

Applicant: SoundHound, Inc.

Inventor： Cristina Vasconcelos , Zili Li

IPC: G10L15/00 , G10L15/22 , G10L15/02 , G10L15/30 , G10L15/18 , G10L15/187 , G10L15/24 , G10L15/06 , G06K9/46 , G06K9/62 , G06K9/72 , G06K9/00 , G10L15/16 , G10L25/30

Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification