Patent search ap:("SoundHound Page Inc.") AND inv:"Cristina Vasconcelos"

1.

发明申请
VISION-ASSISTED SPEECH PROCESSING 有权

公开(公告)号：US20210012769A1

公开(公告)日：2021-01-14

申请号：US16509029

申请日：2019-07-11

Applicant: SoundHound, Inc.

Inventor： Cristina Vasconcelos , Zili Li

IPC: G10L15/22 , G10L15/02 , G10L15/30 , G10L15/18 , G10L15/187 , G10L15/24 , G10L15/16 , G10L15/06 , G06K9/46 , G06K9/62 , G06K9/72 , G06K9/00

Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

2.

发明申请
NEURAL ACOUSTIC MODEL 有权

公开(公告)号：US20210256386A1

公开(公告)日：2021-08-19

申请号：US16790643

申请日：2020-02-13

Applicant: SoundHound, Inc.

Inventor： Maisy Wieman , Andrew Carl Spencer , Zìlì Li , Cristina Vasconcelos

IPC: G06N3/08 , G06N3/04 , G10L15/22 , G10L15/16

Abstract: An audio processing system is described. The audio processing system uses a convolutional neural network architecture to process audio data, a recurrent neural network architecture to process at least data derived from an output of the convolutional neural network architecture, and a feed-forward neural network architecture to process at least data derived from an output of the recurrent neural network architecture. The feed-forward neural network architecture is configured to output classification scores for a plurality of sound units associated with speech. The classification scores indicate a presence of one or more sound units in the audio data. The convolutional neural network architecture has a plurality of convolutional groups arranged in series, where a convolutional group includes a combination of two data mappings arranged in parallel.

3.

发明申请
DRIVER INTERFACE WITH VOICE AND GESTURE CONTROL 有权

公开(公告)号：US20220139393A1

公开(公告)日：2022-05-05

申请号：US17547917

申请日：2021-12-10

Applicant: SoundHound, Inc.

Inventor： Zili Li , Cristina Vasconcelos

IPC: G10L15/22 , G10L15/02 , G10L15/30 , G10L15/18 , G10L15/187 , G10L15/24 , G10L15/06 , G06K9/62 , G10L15/16 , G06V10/40 , G06V10/70 , G06V20/40

Abstract: A driver interface for use within an automobile provides responses to voice commands issued for example by a driver of the automobile. The interface includes a camera and microphone for capturing image data such as gestures and audio data from the automobile driver. The image data and audio data are processed to extract image and linguistic features from the image and audio data, which image and linguistic features are processed to interpret and infer a meaning of the voice command.

4.

发明授权
Vision-assisted speech processing 有权

公开(公告)号：US11257493B2

公开(公告)日：2022-02-22

申请号：US16509029

申请日：2019-07-11

Applicant: SoundHound, Inc.

Inventor： Cristina Vasconcelos , Zili Li

IPC: G10L15/00 , G10L15/22 , G10L15/02 , G10L15/30 , G10L15/18 , G10L15/187 , G10L15/24 , G10L15/06 , G06K9/46 , G06K9/62 , G06K9/72 , G06K9/00 , G10L15/16 , G10L25/30

Abstract: Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

Patent Agency Ranking