-
公开(公告)号:US10304440B1
公开(公告)日:2019-05-28
申请号:US15198578
申请日:2016-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Sankaran Panchapagesan , Bjorn Hoffmeister , Arindam Mandal , Aparna Khare , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Ming Sun
Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.
-
公开(公告)号:US10152973B2
公开(公告)日:2018-12-11
申请号:US14942551
申请日:2015-11-16
Applicant: Amazon Technologies, Inc.
Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.
-
公开(公告)号:US10121471B2
公开(公告)日:2018-11-06
申请号:US14753811
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Bjorn Hoffmeister , Ariya Rastrow , Baiyang Liu
Abstract: An automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses. When the aggregate weighted non-speech exceeds a threshold, an endpoint may be declared.
-
公开(公告)号:US09892726B1
公开(公告)日:2018-02-13
申请号:US14574239
申请日:2014-12-17
Applicant: Amazon Technologies, Inc.
Inventor: Sri Venkata Surya Siva Rama Krishna Garimella , Spyridon Matsoukas , Ariya Rastrow , Bjorn Hoffmeister
CPC classification number: G10L15/063 , G10L15/08 , G10L15/14 , G10L15/22 , G10L25/27 , G10L2015/0631 , G10L2015/088 , G10L2015/223
Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.
-
公开(公告)号:US20170270919A1
公开(公告)日:2017-09-21
申请号:US15196228
申请日:2016-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas
CPC classification number: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/16 , G10L17/02 , G10L17/06 , G10L17/18 , G10L25/87 , G10L2015/088 , G10L2025/783
Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.
-
公开(公告)号:US09589560B1
公开(公告)日:2017-03-07
申请号:US14135309
申请日:2013-12-19
Applicant: Amazon Technologies, Inc.
Inventor: Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Rohit Prasad
IPC: G10L15/01
CPC classification number: G10L15/01 , G06K9/6277
Abstract: Features are disclosed for estimating a false rejection rate in a detection system. The false rejection rate can be estimated by fitting a model to a distribution of detection confidence scores. An estimated false rejection rate can then be computed for confidence scores that fall below a threshold. The false rejection rate and model can be verified once the detection system has been deployed by obtaining additional data with confidence scores falling below the threshold. Adjustments to the model or other operational parameters can be implemented based on the verified false rejection rate, model, or additional data.
Abstract translation: 公开了用于估计检测系统中的假拒绝率的特征。 可以通过将模型拟合到检测置信度分数的分布来估计错误拒绝率。 然后可以计算低于阈值的置信度分数的估计的错误拒绝率。 一旦检测系统被部署,可以通过获得低于阈值的置信度分数的附加数据来验证错误拒绝率和模型。 可以基于验证的假拒绝率,模型或附加数据来实现对模型或其他操作参数的调整。
-
公开(公告)号:US11514901B2
公开(公告)日:2022-11-29
申请号:US16437763
申请日:2019-06-11
Applicant: Amazon Technologies, Inc.
Inventor: Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas
IPC: G10L15/20 , G10L15/02 , G10L17/06 , G10L25/87 , G10L15/08 , G10L15/16 , G10L17/18 , G10L25/78 , G10L17/02
Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.
-
公开(公告)号:US11361763B1
公开(公告)日:2022-06-14
申请号:US15694348
申请日:2017-09-01
Applicant: Amazon Technologies, Inc.
Inventor: Roland Maximilian Rolf Maas , Sri Harish Reddy Mallidi , Spyridon Matsoukas , Bjorn Hoffmeister
Abstract: A speech-processing system capable of receiving and processing audio data to determine if the audio data includes speech that was intended for the system. Non-system directed speech may be filtered out while system-directed speech may be selected for further processing. A system-directed speech detector may use a trained machine learning model (such as a deep neural network or the like) to process a feature vector representing a variety of characteristics of the incoming audio data, including the results of automatic speech recognition and/or other data. Using the feature vector the model may output an indicator as to whether the speech is system-directed. The system may also incorporate other filters such as voice activity detection prior to speech recognition, or the like.
-
公开(公告)号:US20210295833A1
公开(公告)日:2021-09-23
申请号:US16822744
申请日:2020-03-18
Applicant: Amazon Technologies, Inc.
Inventor: Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra
Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.
-
公开(公告)号:US10923111B1
公开(公告)日:2021-02-16
申请号:US16368120
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Xing Fan , I-Fan Chen , Yuzong Liu , Bjorn Hoffmeister , Yiming Wang , Tongfei Chen
Abstract: A system configured to recognize text represented by speech may determine that a first portion of audio data corresponds to speech from a first speaker and that a second portion of audio data corresponds to speech from the first speaker and a second speaker. Features of the first portion are compared to features of the second portion to determine a similarity therebetween. Based on this similarity, speech from the first speaker is distinguished from speech from the second speaker and text corresponding to speech from the first speaker is determined.
-
-
-
-
-
-
-
-
-