Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Spyridon Matsoukas"

1.

发明申请
PROCESSING COMPLEX UTTERANCES FOR NATURAL LANGUAGE UNDERSTANDING 有权

公开(公告)号：US20230032575A1

公开(公告)日：2023-02-02

申请号：US17882874

申请日：2022-08-08

Applicant: Amazon Technologies, Inc.

Inventor： Cengiz Erbas , Thomas Kollar , Avnish Sikka , Spyridon Matsoukas , Simon Peter Reavely

IPC: G10L15/22 , G10L15/06 , G10L15/02 , G10L15/18

Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.

2.

发明授权
Natural language speech processing application selection 有权

公开(公告)号：US11276403B2

公开(公告)日：2022-03-15

申请号：US16693826

申请日：2019-11-25

Applicant: Amazon Technologies, Inc.

Inventor： Ruhi Sarikaya , Rohit Prasad , Kerry Hammil , Spyridon Matsoukas , Nikko Strom , Frédéric Johan Georges Deramat , Stephen Frederick Potter , Young-Bum Kim

IPC: G06F40/295 , G10L15/22 , G10L15/08 , G10L15/26

Abstract: Techniques for limiting natural language processing performed on input data are described. A system receives input data from a device. The input data corresponds to a command to be executed by the system. The system determines applications likely configured to execute the command. The system performs named entity recognition and intent classification with respect to only the applications likely configured to execute the command.

3.

发明申请
USER PRESENCE DETECTION 有权

公开(公告)号：US20210027798A1

公开(公告)日：2021-01-28

申请号：US17022197

申请日：2020-09-16

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal

IPC: G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

4.

发明授权
User presence detection 有权

公开(公告)号：US10796716B1

公开(公告)日：2020-10-06

申请号：US16157319

申请日：2018-10-11

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal

IPC: G10L15/00 , G10L25/78 , G10L15/22 , G10L15/02 , G10L15/30 , G10L15/16 , G10L15/08

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

5.

发明授权
Keyword spotting using multi-task configuration 有权

公开(公告)号：US10304440B1

公开(公告)日：2019-05-28

申请号：US15198578

申请日：2016-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Sankaran Panchapagesan , Bjorn Hoffmeister , Arindam Mandal , Aparna Khare , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Ming Sun

IPC: G10L15/06 , G10L15/08 , G10L15/14 , G10L15/16 , G10L15/28

Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.

6.

发明授权
Class-based discriminative training of speech models 有权

公开(公告)号：US09892726B1

公开(公告)日：2018-02-13

申请号：US14574239

申请日：2014-12-17

Applicant: Amazon Technologies, Inc.

Inventor： Sri Venkata Surya Siva Rama Krishna Garimella , Spyridon Matsoukas , Ariya Rastrow , Bjorn Hoffmeister

IPC: G10L15/00 , G10L15/06 , G10L15/14 , G10L15/08 , G10L15/22

CPC classification number: G10L15/063 , G10L15/08 , G10L15/14 , G10L15/22 , G10L25/27 , G10L2015/0631 , G10L2015/088 , G10L2015/223

Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.

7.

发明授权
Determining speaker direction using a spherical microphone array 有权
Title translation: 使用球形麦克风阵列确定扬声器方向

公开(公告)号：US09560441B1

公开(公告)日：2017-01-31

申请号：US14582305

申请日：2014-12-24

Applicant: Amazon Technologies, Inc.

Inventor： John Walter McDonough, Jr. , Volker Sebastian Leutnant , Sri Venkata Surya Siva Rama Krishna Garimell , Spyridon Matsoukas

IPC: H04R3/00 , H04R1/32 , H04R25/00 , H04R1/40 , H04R5/027 , H04R1/20

CPC classification number: H04R1/326 , G01S3/803 , G10L21/0216 , G10L2021/02166 , H04R1/20 , H04R1/32 , H04R1/406 , H04R3/005 , H04R5/027 , H04R25/407 , H04R29/005 , H04R2201/401

Abstract: A system that detects audio including speech using a spherical sensor array estimates a direction of arrival of the speech using a Kalman filter. To improve the estimates of the Kalman filter, the system estimates a noise covariance matrix, representing noise detected by the array. The structure of the noise covariance matrix is determined, using an assumption of spherically isotropic diffuse noise. The intensity of the noise covariance matrix is estimated based on the intensity of audio detected by the array.

Abstract translation: 使用球面传感器阵列检测包括语音的音频的系统使用卡尔曼滤波器来估计语音的到达方向。为了改进卡尔曼滤波器的估计，系统估计噪声协方差矩阵，表示阵列检测到的噪声。使用球面各向同性漫反射噪声的假设来确定噪声协方差矩阵的结构。噪声协方差矩阵的强度基于阵列检测到的音频强度来估计。

8.

发明授权
Speech processing optimizations based on microphone array 有权

公开(公告)号：US11935525B1

公开(公告)日：2024-03-19

申请号：US16895377

申请日：2020-06-08

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Minhua Wu , Anirudh Raju , Spyridon Matsoukas , Arindam Mandal , Kenichi Kumatani

IPC: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/00 , G10L15/08 , G10L21/0216 , H04W4/02

CPC classification number: G10L15/22 , G06F40/40 , G10L15/187 , G10L15/26 , G10L15/30 , G10L21/0208 , H04R3/005 , G10L2015/088 , G10L2015/223 , G10L2021/02166 , H04W4/025

Abstract: Systems and methods for utilizing microphone array information for acoustic modeling are disclosed. Audio data may be received from a device having a microphone array configuration. Microphone configuration data may also be received that indicates the configuration of the microphone array. The microphone configuration data may be utilized as an input vector to an acoustic model, along with the audio data, to generate phoneme data. Additionally, the microphone configuration data may be utilized to train and/or generate acoustic models, select an acoustic model to perform speech recognition with, and/or to improve trigger sound detection.

9.

发明授权
User presence detection 有权

公开(公告)号：US11657832B2

公开(公告)日：2023-05-23

申请号：US17022197

申请日：2020-09-16

Applicant: Amazon Technologies, Inc.

Inventor： Shiva Kumar Sundaram , Chao Wang , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Arindam Mandal

IPC: G10L15/00 , G10L25/30 , G10L25/51 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/78 , G10L15/08

CPC classification number: G10L25/30 , G10L15/02 , G10L15/16 , G10L15/22 , G10L15/30 , G10L25/51 , G10L25/78 , G10L2015/088 , G10L2025/783

Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

10.

发明授权
Detecting system-directed speech 有权

公开(公告)号：US11361763B1

公开(公告)日：2022-06-14

申请号：US15694348

申请日：2017-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Roland Maximilian Rolf Maas , Sri Harish Reddy Mallidi , Spyridon Matsoukas , Bjorn Hoffmeister

IPC: G10L15/22 , G10L15/02 , G10L15/16 , G10L15/18 , G10L15/34 , G10L17/22

Abstract: A speech-processing system capable of receiving and processing audio data to determine if the audio data includes speech that was intended for the system. Non-system directed speech may be filtered out while system-directed speech may be selected for further processing. A system-directed speech detector may use a trained machine learning model (such as a deep neural network or the like) to process a feature vector representing a variety of characteristics of the incoming audio data, including the results of automatic speech recognition and/or other data. Using the feature vector the model may output an indicator as to whether the speech is system-directed. The system may also incorporate other filters such as voice activity detection prior to speech recognition, or the like.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification